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Abstract 


Concentration of measure is a phenomenon in which a random variable that depends 
in a smooth way on a large number of independent random variables is essentially 
constant. The random variable will ’’concentrate” around its median or expectation. 
In this work, we explore several theories and applications of concentration of measure. 
The results of the thesis are divided into three main parts. In the first part, we 
explore concentration of measure for several random operator compressions and for 
the length of the longest increasing subsequence of a random walk evolving under the 
asymmetric exclusion process, by generalizing an approach of Chatterjee and Ledoux. 
In the second part, we consider the mixed matrix moments of the complex Ginibre 
ensemble and relate them to the expected overlap functions of the eigenvectors as 
introduced by Chalker and Mehlig. In the third part, we develop a g-Stirling’s formula 
and discuss a method for simulating a random permutation distributed according to 
the Mallows measure. We then apply the g-Stirling’s formula to obtain asymptotics 
for a four square decomposition of points distributed in a square according to the 
Mallows measure. All of the results in the third part are preliminary steps toward 
bounding the fluctuations of the length of the longest increasing subsequence of a 
Mallows permutation. 
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1 Introduction 


The idea of concentration of measure was first introduced by Milman in the asymp¬ 
totic theory of Banach spaces [Milman and Schechtman, 1986 . The phenomenon 
occurs geometrically only in high dimensions, or probabilistically for a large number 
of random variables with sufficient independence between them. For an overview of 


the history and some standard results, see Ledoux, 2005 


A illustrative geometric example of concentration of measure occurs for the stan¬ 
dard n-sphere S” in If we let /i„ denote the uniform measure on then for 

large enough n, /i„ is highly concentrated around the equator. 


To see exactly what we mean by ’’highly concentrated”, let us consider any mea¬ 
surable set A on such that ^n{A) > 1/2. Then, if we let d{x,A) be the geodesic 
distance between a: G and A, we dehne the expanded set 


At = {x G §"■ ; d{x, A) < r} 

At contains all points of A in addition to any points on with a geodesic distance 
less than r from A. The precise inequality that can be obtained says that 




(n-l)r2/2 






2 


In other words ’’almost” all points of the sphere are within distance of from onr 
set A. Obvionsly as n —>■ oo, this qnantity becanse inhnitesimal. This example is dne 


to Gromov, and more discussion can be found in Gromov, 1980 


Gromov’s work on concentration on the sphere was inspired by Levy’s work Levy 


and Pellegrino, 1951 on concentration of functions. Suppose we have a function F, 


which is continuous on S"' with a modulus of continuity given by Upit) = sup{|F(a;) — 
F{y)\ : d{x,y) < t}. Let mp be a median for F, which by dehnition means that 
^J^niF > mp) > and yn{F < m) >1/2. Then we have 


yn{{\F-mp\>up{t)}) 


While these geometric examples give a nice introduction to the phenomenon, in 
this work we will mainly be interested in concentration of measure in a probabilis¬ 
tic setting. Let us give a simple example that will give some intuition about how 
concentration of measure comes up in probability. Suppose we have independent 
random variables Xi, X 2 ,..., X„. Suppose that they take the values 1 and —1, each 
with probability 1/2. For each n > 1, let Sn = Since E(Xj) < 00 (in fact 

E(Xj) = 0), the strong law of large numbers tells us that S'„/n converges almost 
surely to E(Xj) as n —)■ 00 . Remember that this means that 

P f lim — = E(Xi)^ = 1 

yn-5-oo 77, J 

Moreover, by the central limit theorem, we know that 

(x)y/v(o.a^) 

where is the variance of each X,, which in this case is 1. This shows us that the 
fluctuations of Sn are of order n. However, notice that l^nl can take values as large 
as n. If we measure Sn using this scale, then ^ is essentially zero. The actual bound 
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looks like 


P 




n 


>r] < 


for r > 0. See Talagrand, 1996 for a proof. As Talagrand points out, concentration 
of measure appears in a probabilistic setting by showing that one random variable 
that depends in a smooth enough way on many other independent random variables 
is close to constant, provided that it does not depend too much on any one of the 
independent random variables. As we will see later in this work, it turns out that this 
idea still holds true if we have a random variable that depends on a large number of 
’’almost” independent random variables. We will later see an instance of a random 
variable that depends on many weakly correlated random variables. It requires a 
little more work to prove concentration of measure, but often, it is still possible. 

This work is divided into chapters. Chapter 2 introduces Talagrand’s Gaussian 
concentration of measure inequality, Talagrand’s isoperimetric inequality, Ledoux’s 
concentration of measure on Markov chains, and the Euler-Maclaurin formula. A 
statement and proof of each theorem (with the exception of the Euler-Maclaurin for¬ 
mula) is also given, to make this work as self-contained as possible. In later chapters, 
we will see new applications of each of these results. Chapter 3 introduces several 
new results using Ledoux’s concentration of measure inequality on reversible Markov 


chains. We are able to generalize a method first used by Chatterjee and Ledoux Chat- 


terjee and Ledoux, 2009 to prove concentration of measure for two different random 


operator compressions. We also show how to use this method to obtain concentration 
of measure bounds for the length of the longest increasing subsequence of a random 
walk evolving under the asymmetric exclusion process. To give more meaning to our 
fluctuation bounds, we also derive a lower bound for the length of this longest increas¬ 
ing subsequence. It turns out that we can use Talagrand’s isoperimetric inequality 
to do this, even though our random variables have weak correlations. In Chapter 
4, we discuss a method for calculating the mixed matrix moments in the Ginibre 
random matrix ensemble using techniques from spin glasses. In addition, we use the 
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mixed matrix moments to compute asymptotics of the overlap functions (introduced 
by Chalker and Melilig Chalker and Melilig, 199^ ) for eigenvectors corresponding 
to eigenvalues near the edge of the unit circle. We propose an adiabatic method 
for computing explicit formulas for the eigenvector overlap functions. In Chapter 5, 
we use the Euler-Maclaurin formula to prove a g-deformed Stirling’s formula. We 
demonstrate a use of the g-Stirling’s formula to obtain asymptotics for point counts 
in a four square problem. We also discuss techniques and algorithms to simulate a 
Mallows random permutation. 
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2 Concentration of Measnre 
Results and other Necessary 
Background 


2.1 Talagrand’s Gaussian Concentration of Mea¬ 
sure Inequality 

Michel Talagrand has made numerous contributions to the theory of concentration 
of measure. The hrst concentration of measure result that we will present applies to 
Lipschitz functions of Gaussian random variables, so we will refer to it henceforth 
as Talagrand’s Gaussian concentration of measure inequality, to distinguish it from 
other results of Talagrand that we will use. Before stating the theorem, recall that a 
Lipschitz function F on with Lipschitz constant A, satishes 


|F(x)-F(y)| < A||x-y| 


where ||x — y|| is the Eucliean distance between x and y. The following theorem is 


due to Talagrand Talagrand, 2003 


Theorem 2.1.1. Consider a Lipschitz function F on with Lipschitz constant 
A. Let xi,... ,xm denote independent standard Gaussian random variables, and let 
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X = (xi,... ,xm)- Then for each t > 0, we have 

P(|F(x)-EF(x)|>()<2exp(^-^j (2,1) 

Proof. For this proof, we will assume that F is not only Lipschitz, but also twice 
differentiable. This is the case in most applications of this theorem, and if it is not 
the case, we can regularize F by convoluting with a smooth function to solve the 
problem. We begin with a parameter s and consider a function G on dehned as 


G{zi, Z2m) = exp {s{F{zi, ...zm)- F{zm+i, • • •, Z2m))) 


Let Ml,... U2m be 2M independent standard Gaussian random variables. Let Ui,... V2m 
also be 2M random variables (independent of the Ui,... ,um) such that first M 
(mi, ... ,vm) are independent standard Gaussians and such that the second M vari¬ 
ables (mm+1, • • •, V2m) are copies of the hrst M v's, in order, (i.e. n* = Vi+M if * < M.) 
Notice that due to the independence of the m’s and the hrst M n’s, we have 

KuiUj — KviVj = 0 

except when j = i + M or i = j + M, in which case we have 

KuiUj — KviVj = 0 — 1 = — 1 
We consider a function f(t) = (/i,... f- 2 M)if) given by 


fi{t) = VtUi + Vl - tVi 


Note that f(0) = v and that f(l) = u. Also, consider 


(fit) = EG(f(t)) 
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so that 


2M 


d r)C 


2=1 


To simplify 0'(t), recall the Gaussian integration by parts formula. For Gaussian 
random variables y, j/i,..., and a function F (of moderate growth at inhnity), we 
have 

^ ^ QP 

• , 2 /n) = ^E(2/2/i)E—(j/i, ...,yn) 


2=1 


(See Talagrand, 2003| Appendix 6 for a proof). 

Using the fact that 

d , I 1 

-r ti = —-pUi - , Vj 

dV 2Vi 

and applying Gaussian integration by parts, gives 


2M 


fj(i) = E 


*J = 1 


--U 


2Vi 2^/l^ 


\ _ g2Q 

ij {Viuj + \/r^n,)E^-^f(f) 


Using the independence of the n’s and the n’s, we have that 


E I -=M,' 


- Ui j {VtUj + a/I — tvj) = -{KuiUj — KviVj) 

ij J 2 


2Vt 2v^ 

which we have already determined is equal to 0 unless j = i + M oi i = j + M {m 
which case it is —1), so we have 

^ d'^G . , ,, 

0 if) = -E -(u(t)) 


i=lM 


dzidzi+M 


Gomputing the second derivative gives 


3^G , , PF, 



















Since F is Lipschitz, we know that for all x G 



so we can use the Cauchy-Schwarz inequality to get 

Notice that 0(0) = 1 (since at t = 0 the m’s disappear and the second half of the v’s 
cancel the first half). Hence we have 

(j)'(t)/< s'^A^ 


so 

log(0(t)) < + C 

or 

0(t) < 

and 

4>{t) < exp(s^H^) 

Recalling that /*(!) = Mi, this tells us that 


2 a2 

Eexp(s(F(Mi,.. .,um) - F{um+i, ■ ■ • ,M 2 m)) < 


By independence of the m’s, we have that 


Eexp(s(F(Mi,.. .,um) - F{um+i, ■ ■ • ,M2m))) 


Eexp(s(F(Mi,... ,MM)Eexp(-sF(MM+i, • • • ,M 2 m)) (2.2) 
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By Jensen’s ineqnality, we know that 


Eexp(-sF(MM+i, • • ■,U2m)) > exp{-sEF{uM+i, ■ ■ • ,M 2 m)) 


for s > 0. 

Pntting this all together, we have 


Eexps(F(x) — EF(x)) < e‘ 


2^2 


where x is a length M vector of independent standard Gaussian random variables. 
By Markov’s inequality 


P(F(x) - EF(x) >t)= P(s(F(z) - E(F(z)) > st) < e" 




Letting s = t/2A^^ we have 


P(F(z) - EF(z) >t)< exp 


4^2 


We can then apply the same inequality to —F and we will have our result. 


□ 


It is worth noting that the method used to prove this result is quite important. Ta- 


lagrand Talagrand, 2003 refers to this method of proof as the ’’smart path method” 


This method can be applied to a variety of problems. Notice that we found a ’’path” 
(namely our function f), which took us between the situation that we wanted to study 
and a simpler situation. Beyond choosing an appropriate path, the only real work left 
to do was to get bounds on the derivatives along the path. Talagrand points out that 
although this method leads to an elegant proof, the choice of path is highly important 
and nontrivial. Often the choice is not obvious and can be found only after a careful 
study of the structure of the problem. 
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2.2 Talagrand’s Isoperimetric Inequality 


The concentration of measure inequality presented in this section is also due to Tala- 


grand. In Talagrand, 1995 , a theory of isoperimetric inequaliies on product spaces 
is developed. The theorem presented here is just one of the many isoperimetric in¬ 
equalities proved and applied in that work. Once the necessary notions of distance 
are dehned and the theorem proved, the applications of the theorem are vast and 
obtained quickly. Before stating the theorem, we need to set up our product space 
and dehne a special notion of distance on the space. 


We will begin with a probability space which we will denote by (O, T^ P). To give 
an idea of what we mean when we talk about a product probability space, we will 
give an example of the product of two probability spaces. 


Suppose that we have two probability spaces given by (fli, J^i, Pi) and (fl 2 , -P 2 , P 2 )- 
We want to form a product space which is the ’’product” of these two probability 
spaces. For ease of notation, we will usually just denote the product space by f2i x 
leaving the sigma algebras and the measures implicit. Our new measure space is just 
the cross product Oi x 02- The new sigma algebra is given by the tensor product 
P'i0p2- We dehne the product measure Pi x P 2 by (Pi x P 2 )(Pi x P 2 ) = Pi(Pi)P 2 (P 2 ) 
for all Pi G Pi and P 2 G p 2 - We can then dehne a product of n probability spaces 
by extending this notion. 


Given our probability space (G,P, P), we will be considering the product space 
O”. Given A C G”, Talagrand’s isoperimetric inequality gives us bounds on the 
measure of the set of points that are within a specihed distance of this set A. Before 
we can state the inequality, we need to develop a notion of distance. 


For X & VP and A C G", we dehne Talagrand’s convex distance to be 


t : 


n 

V{Q;j}, 3y E A such that E ail{xi ^Vi] <t 

i=l 



1 

2 


dT^x, A) = min 



11 


The similarity between dT{x, A) and Hamming’s distance 


n 


dnix, A) = inf yi) 


2 = 1 


should be noted. Notice that if all ai = n then Talagrand’s convex distance 
is always at least as large as times the Hamming distance. One of the main 


reasons that we use dT{x,A) instead of dH{x,A), is that dx^x^A) not only allows 
us to weight the summands differently, it allows us to choose weights that explicitly 
depend on the values of the Xi. This flexibility allows the inequality to be applied to 
a much wider range of problems. 

A second (and equivalent) way of dehning Talagrand’s convex distance is by 



(2.3) 


To gain a bit of understanding about the convex distance, let us look at a simple 
example. Suppose that we are working in one dimension. Let a: G M and let our set 
A just be {y}, the set containing only the point y eM.. Then 


dxi^x, {?/}) = min{t >0 : Va G M>o, al{x ^ y) < t||Q;||} 


1 A y ^ X 
0 A y = X 


Given A C ca"', we dehne 


At = {x E : dxix, A) < t} 


In other words. At is the set of all points that are within a distance t of A. The 
following inequality can be found in [Talagrand, 1995 and tells us that for a set A of 
’’reasonable size”, P{At) is close to 1. 
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Theorem 2.2.1. For every A <zVF, we have 




A) 


dP{x) < 


1 


(2.4) 


and consequently 


e-tpA 


P{dTix,A) >t)< 

( 2 . 6 ) 

and 


e-tVi 


S ^ - P(A) 

( 2 . 6 ) 

It should be noted that the proof given here more 

closely follows the proof as 

given in Steele, 1997 as opposed to jTalagrand, 1995 . 

The method is basically the 


same as Talagrand’s original proof although some components are presently slightly 
differently and appear in a different order. 


Before we can begin the proof of the theorem, we need a deeper understanding 
and a different characterization of the convex distance. We will begin by dehning a 
set Ua{x). Elements of this set will be elements of containing only O’s and I’s. We 
will begin with the set Ua{x), which is the set of all vectors Uy = {l{xi yi), l{x2 7 ^ 
1 / 2 ),..., 1 {xn 7 ^ yn)) for y G A. We then let Ua{x) be the set which includes all of 
these vectors in addition to all vectors we can obtain from U'j^{x) by switching some 
of the O’s to I’s. In other words, u G Ua{x) if and only if u — Uy > 0 for all y E A. We 
then dehne the set Va(x) to be the convex hull of Ua{x). By convex hull, we mean 
the set of all convex combinations of vectors in Ua{x). We then have the following 
dual characterization of dxix, A). 


Proposition 2.2.2. 

dxiXjA) = min{||u ||2 : v G Va{x)} 


Proof. Begin with 


n 


min 

yeU 


'^ail{xi ^ yi) 

i=l 









13 


Using the definition of Ua{x), this is 


n 


mm > ttjMj 
u£Ua{x) 

l=\ 


Using the fact that the minimum of a linear functional on a convex set is equal to 
the minimum over the set of extreme points, we have that the above is equal to 


v&Va{x) 


n 

E' 

2=1 


Next, we apply the Cauchy-Schwarz inequality to get 


1/2 


< min { > of 

v&Va{x) 1 

2=1 


n 

E 

2=1 


1/2 


lmin{||t ;||2 : v G Va{x)} 


2=1 


Recalling Talagrand’s convex distance. 


drix, A) = min < t : V{Q;i}, 3y & A such that E ail{xi ^Vi} <t E“? 


2=1 


. 2=1 


we immediately have 


dT{x,A) < min{||t ;||2 : v e Va{x)} 


by our last inequality. 

Now we need to prove the reverse inequality. By the linear functional characteri¬ 
zation of the Euclidean norm, there is an a with ||q ;||2 = 1 such that for all v G Va{x), 
we have 

n 

> min{||t ;||2 : v G Va{x)} 

2 = 1 
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By definition of Va{x), this implies that for all y E A, we have 

n 

^ yi) >mm{\\v\\2 : v E Va{x)} 

i=l 

Using equation (2.2), this immediately applies the reverse inequality 

□ 


Now that this result is established, we can prove Talagrand’s isoperimetric in¬ 
equality. 


Proof, {of Theorem 2.2.1) To prove the theorem, we will use induction on the dimen¬ 
sion of the product space. To prove the base case, we will start with n = 1. In this 
case we have 

^) = min{||n||2 : v E Va{x)} 


which is 


1 : X ^ A 
0 -. X E A 


Plugging this into the integral from Talagrand’s theorem, we have 


J exp dP{x) ~ J dP{x) + j dP{x) 

= P{A) + e^/\l-P{A)) 

If we can show that this quantity is < 1/P{A), the base case will be proved. This 
will be relatively easy to show, using a little bit of calculus. For ease of notation, let 
p = P{A). Then (multiplying on both sides of the equation by p), we need to prove 
that 

p^ -t- e^'^^p(l — p) < 1 

To prove this, we will take derivatives to find the p that maximizes p^ -|-e^/^p(l —p) — 1. 
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Taking the derivative and solving for p gives p ~ 2.26. Hence, on the interval [0,1], 
p 2 _|_ _ p'j obtains its maximum at 1. At p = 1, the inequality is satished, so 

the base case is proved. 

Now we will proceed with the inductive step. Assume that for any A C ff"-, we 
have 

where P{A) now represents our product measure on H"'. We need to check and 
make sure that the inequality holds for dimension n + 1. We start with an arbitrary 
A C We will begin by writing as hi” x H. Let a; G and a; G H. Then 

(x,u) G We will consider two different sets. Following Steele, we will dehne 

the following as the u section of A, given by 

A(cj) = {x : {x,u) G A} C fl"' 

and the projection of A, given by 

H = IJ A{u) = {x : 3(a;,a;) G A} C 

To prove the theorem, we will show that drix, A) (in n+1 dimensions) can be bounded 
in terms of the convex distances for the u sections and the projections. To do this, 
we prove the following lemma. 

Lemma 2.2.3. For all 0 < t < 1 and A{u) and B as defined above, we have 

d‘^{{x,uj),A) < t{d‘^{x, A{uj))) + (1 — t){dT{x, B)Y + (1 — tY (2.7) 

Using the alternative characterization of dx from Proposition 2.2.2, we can find 
vectors vi G Va{u}){x) and V2 G Vb{x) such that dT{x,A{u) = ||ni|| and dx^x^B) = 
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First we note that (ni,0) G Va{x,u). To see this, use the fact that since Vi G 
Va(uj){x), we know that (the ith component of ui) is equal to l{xi ^ Ui) for 
some y G A{uj) for all i. Then, if y G A{u), we know that {y,u)) G A. Hence, 
= l(a; ^ w) = 0. 

Next, we note that (^ 2 ,1) ^ Va{x,u). This follows immediately from the fact that 
starting from a vector in U'^{x,u), we can always change O’s to I’s and remain in 
Ua{x,u), and hence in Va{x,u). 

Since Va{x,Q) is convex, 

t{vi, 0) + (1 - t){v2, 1) = {tvi + (1 - t)v2 ,1 - t) 


is also in Va{x, H). Notice that by our alternative characterization of cIt-, ||(tni + (1 
t)v 2 , 1 — t)|| is an upper bound on drUx, u), A). 


||(fni + (1 -t)n2,l -t)f = + {l-t)v^2^f + (1 - f) 


,(b\2 


i=l 


<t\\v,\\l + {l-t)\\v2\\l + {l-tf 

Using drix, A{u)) = ||ni|| and dr^x^B) = ||n 2 ||, we have proved the lemma. 
Keeping u hxed, dehne In{oj) to be the n-fold integral given by 


In{ix) = I exp ( dP{x) 


Using Lemma 2.2.3, we have that 


In{(x) < I exp { -{td^{x,A) + {1-t)d^{x,B) + {1-t) ) ] dP{x) 


(i-ty 


exp 1 ^) / exp ( yd‘^{x,A) ) exp ( ^-^dl{x,B)j dP{x) 


1-t 


( 2 . 8 ) 
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Recall that Holder’s inequality says that 


f r \ 

\fg\ dfi< { I \f\P d/i ) ( / 1 ^ 1 ^ dfi 


Applying this to (2.7) with p = Ijt and g = 1/(1 — t) gives 

In{uj) < exp ^ (^J exp dP{x)^ (^J exp ^dr(a;,R) ) dP{x) 


Now, by the induction hypothesis, we have 

(l-t) 


Iniu)) < exp 


2\/ ^ ^ \l-i 


P(A(a;)); \PiB) 


( PjAiu)) 


-t 


exp 


(1-t)^ 


(2.9) 


P{B) V P{B) J V 4 

Since A{u) C A{B), we know that P{A{u)) < P{B). In order to complete the proof 
of the theorem, we need the following lemma. 

Lemma 2.2.4. For all 0 < r < 1, we have 


inf r * exp ( -(1 — t)^ < 2 — r 

o<t<i V 4 ' 


The proof of this lemma is essentially a calculus exercise, but we will give an 
outline. Taking the derivative of r“*exp (^(1 — t)^) with respect to t gives 

— exp ^^(1 “ r“*(ln(r) + 1/2(1 — t) 

Optimizing in t gives t = 1 + 2 ln(r), which can be shown to be a minimum. Plugging 
back into r“*exp (4(1 — t)^^ gives 


^-i+2in(r) exp(l/4(l - 1 - ln{r)f) 
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After some simplification, we get that the above 

^ ^21n(r)+l 


To show that < 2 — r, we just need to show that r + < 2 . Using 

calculus, one can show that 7 --)_ 7 - 2 in(r’)+i jg decreasing on 0 < r < 1 and therefore, the 
inequality is true. This concludes the proof of the lemma. 

Applying this lemma to equation (2.8) gives 




1 

P(K) 


L PiA{u;)) \ 

V PiB) ) 


This can now be integrated with respect to cu, which gives 


1 P(A) / P(/l)\ 

P(/l)P(B)f P{B)) 

Notice that if we can prove that 


P{A) / P(A)\ 
P{B) ^ P{B)J 


< 1 


then our proof will be complete. Letting x = p^, we want to determine for which 
X, x{2 — x) < 1, or equivalently, for which x, — 2a; + 1 > 0. Since this factors as 
{x — 1 )^, this inequality is true for all x, which completes the proof. □ 


As previously mentioned, although the setup and proof of this theorem took a fair 


amount of work, most applications of the theorem are elegant and quick. See Steele, 


1997 for a discussion and explanation of common applications. In addition, we will 


see an application in a later chapter. 
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2.3 Ledoux’s Concentration of Measure on Reversible 
Markov Chains 


A concentration of measure result proved by Ledoux Ledoux, 2005 


turns out to 


be a key foundational piece for some of the results of chapter 3. Before stating the 

we 


result, we provide a few dehnitions. Following the notation in Ledoux, 2005 


will let (n,/i) denote a Markov chain on a hnite or countable set X. A Markov 
chain is a stochastic process which moves between elements of X according to the 
following rules: if the chain is at a given x & X, the next position in the chain is 
chosen according to a fixed probability distribution n(x, •). In other words, given a 
starting position x E X, the probability to move from x to y is Il{x,y). We call X 
the state space and If the transition matrix. Markov chains satisfy a ’’memoryless” 
property. This property (called the Markov property) is stated in mathematical terms 
as follows. 


For notational purposes, let S = (If, /i), so that St is the current state of the chain 
at some discrete time f > 0. Then for all x,y E X and events Kt-i = n(ZQ{S'i = x*} 
satisfying fl {St = x}) > 0, we have 


P(S't+i = y I Kt-i n [St = x}) = F{St+i = y\St = x) = n(x, y) 


A simple explanation of this property is that the future depends only on the present, 
not on the past. For a complete discussion of Markov chains and more properties, see 
Levin et al, 2009 . 

Furthermore, for this application, we require that the Markov chain be irreducible. 
A Markov chain is irreducible if for any x,y E X, there exists an integer f > 0, such 
that n*(x, ?/) > 0. By the notation n*(x, ?/), we mean that P(S't = y\ Sq = x). In 
other words, there is a positive probability of going from any state to any other state. 
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A probability measure /x on X is called an invariant (or stationary) measure if 

^/i(a:)n(x,2/) = /i(|/) 

x£X 

for all y G X. Regarding If as a matrix (where n(a:,|/) is the (f,j)th entry) and y as 
a vector, this is equivalent to the condition 

fi = yU 

which perhaps gives a more intuitive idea of the measure. This is the /i that we will 
refer to in the notation (If, y) for the Markov chain. 

A Markov chain is reversible if 


y{x)U{x,y) = y{y)U{y,x) 


for all x,y & X. This is often called the detailed balance condition. 

From now on, we will assume that (If, y) is a reversible Markov chain with tran¬ 
sition matrix If and invariant measure y. For functions / and g on X, the Dirichlet 
form associated to (If, y) is given by 

£U,g) := ((/-n)/,<7)^ 


In particular, 

= \ “ f{y)?Kx)Tl{x,y) 

x,y&X 


For a proof of this fact, see 


Levin et ai, 2009 


We will notate the eigenvalues of 11 in decreasing order by 


1 = hi > h2 > ■ ■ ■ > V\X\ > -1 
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Notice that 1 must be an eigenvalue of If, since (letting e temporarily represent a 
vector of all I’s) 

lie = e 


using the fact that the sum along each row and column of fl is 1. To see why the rest 
of the eigenvalues must have magnitude less than 1, note that if |? 7 fc| > 1 for some k, 
then li^Vk = Vk'^k for an eigenvector v^- If \Vk\'^ is large, this contradicts the fact that 
all entries of If are between 0 and 1. 

The spectral gap of the Markov chain is dehned by Ai := 1 — r] 2 . The spectral 
gap relates to the Dirichlet form via the Poincare inequality, which says that for all 
functions f on X, 

AiVar^(/)<^(/,/) 

In order to work with Ledoux’s concentration of measure result, we need to dehne a 
triple norm on functions / on X. Let 


Ill/ll 


|2 
I oo 


1 

= ;tsup 

^ x&X 


E 

yex 


\f{x) - f{y)\'^U{x,y) 


We are now in a position to state Ledoux’s concentration of measure result on Markov 


chains Ledoux, 2005 


Theorem 2.3.1. Let (n,/i) be a reversible Markov chain on X with a spectral gap 
given by Ai > 0. Then, whenever |||-F|||oo < I, F is integrable with respect to y and 
for every r > 0, 

/i (^{F > j Fdy + 

Proof. We will begin by assuming that F is a bounded function on X with mean 0 
and |||F|||oo < 1- We will let 


A(A) = / e^^dy 
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for A > 0. By definition, 

x,y^X 

which, by symmetry is equal to 

^ |^AFW/2_^AF(,)/2]2n(j,_„)^({^}) 

F(y)<F{x) 

= Y, - 2e"/2(^(")+^(^))n(a;, y)y{{x}) 

F(y)<F{x) 

Using the fact that F{y) < F{x) in the region we are summing over, the above is 

F(y)<F{x) 

F{y)<F{x) 

Taylor expanding the exponential to second order gives 

< 2e"^(")(1 - (1 + \/2{F{y) - F{x)) + {\/2F{y) -\/2F{x)f/2Tl{x, y)y{{x}) 

F(y)<F{x) 

The hrst order term cancels by symmetry once we go back to summing over the whole 
x,y E X, leaving us with 

e"^(")(A/2F(|/) - \/2F{x)fn{x,y)y{{x}) 

x^y^X 

= ylllOIIL / 
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so we have showed that 


j(^AF/2_^AF/2) <111^111^ 

The Poincare inequality says that 


AiVar^(/)<^(/,/) 


Using this, and the fact that 

AiVar(e^^/2) ^ Ai(A(A) - A2(A/2)) 


we have that 

Ai(A(A)-A 2(A/2))<A^|||F|||LA(A) 
Recalling that |||-U|||^ < 1 by assumption, we have the inequality 


Solving for A(A) gives 


A(A)< 


A( ^ 


f V2 


Now we use the same inequality on the A term and iterate n times, leaving us 
with 


n-l / 

AWsfl 73 ^ 

fc=0 \ 4''Ai 

We now let n —)■ oo. The product will converge provided that A < ^/Xi. This 
assumption does not cause any problems as we only required that A be nonnegative. 
Recall that A(A) = / e^^djj, and that F is bounded, so A(A) = 1 + o(A). This gives 
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A(A/2")^ —)■ 1 as n —)■ cx). Hence we are left with 


A(A) < n 


1 - 


A2 


A:=0 \ 


If we now set A = hy/Xi, we have 


A,A5A<n(r^)<3 

k=0 X ^ 4fc+i 


Recall this tells us that 


e^^dfj, < 3 


Markov’s inequality states that for a nonnegative integrable random variable X and 
and r > 0, 


P(X > a) < 


E{X) 


Applying this to the above equation, we have 


(eV — > e") < 3/e’’ 


so 



> r) < 3/e’' 


giving 


P(F > r) < 3e 



( 2 . 10 ) 


This is essentially the result we wanted to prove, except that to begin with, we 
assumed that F was a mean zero function and bounded. To get rid of the mean 0 
condition, we can simply replace F in the beginning of the proof with F' = F — K{F), 
giving us a mean zero function and our desired result. To relax the boundedness 
condition, we approximate F by = min(|F|, n). Notice this still satishes || |F|| < 

1. Choose an m such that P(|F| < m) >ll2 for all n and an r such that < 
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1/2. Since 


we must have 


V^-^l 

F{Fn > r) < 3e 


Fndfx < m + r 


Then, by the monotone convergence theorem, we have 


\F\dfi < oo 


We can then apply (2.1) to min(max(F, —n),n) and let n —)■ cx) to get the hnal result. 

□ 


In chapter 3, we will see multiple ways that this theorem can be applied to get 
concentration of measure results for a variety of interesting quantities merely by 
choosing an appropriate Markov chain with a known spectral gap. 


2.4 The Euler-Maclaurin Formula 


The Euler-Maclaurin formula is a formula which enables us to make a connection 
between a sum and its corresponding integral, provided the function is sufficiently 
smooth. Before we can state the formula, we need a few preliminary dehnitions 
and notation. Let [xj denote the greatest integer function, so that [xj returns the 
greatest integer less than or equal to x. For s = 1,2,..., let Bs{x) denote the 
Bernoulli polynomials. The generating function for the Bernoulli polynomials is as 
follows: 


e* — 


1 


s=0 




For s > 1, we will let Bg := Bs{0). These Bg are called the Bernoulli numbers. The 
hrst few Bernouli numbers are given by Bi = —12, B 2 = 16, B^ = 0, B 4 = 130, B^ = 
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0 , i?6 = 142. See Andrews et a/., 1999 for more details and alternative definitions. 


We now have everything that we need to state the Euler-Maclanrin formnla. 
Theorem 2.4.1. Suppose f has continuous derivatives up to order s. Then 


n pjl S jy 

fix) = / fix)dx + 

m+1 i=l 


-^iF-^\n)-f^^-^^im)) 

/ly-l rn 

+ ^- / Bsix - [x\)f^''\x)dx (2.11) 

^• .I'm 


Notice that this formula allows a sum to be estimated by its corresponding in¬ 
tegral (or an integral by its sum), and gives an exact formula for the error in using 
this estimation. In many applications, this error term can at least be bounded, if not 
computed exactly. The proof of the formula involves successively performing integra¬ 
tion by parts, which gives a sequence of periodic functions relating to the Bernoulli 


polynomials. For a proof, see Andrews et al, 1999 . We will use a similar method of 


proof to prove a g-deformed version of Stirling’s formula in a later section. 
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3 Random Operator 
Compressions 


3.1 Background for First Result 


In a recent work of Chatterjee and Ledoux on concentration of measure for random 


submatrices Chatterjee and Ledoux, 2009 , it is proved that for an arbitrary Hermi- 


tian matrix of order n and k < n sufficiently large, the distribution of eigenvalues is 
almost the same for any principal submatrix of order k. Their proof uses the random 
transposition walk on the symmetric group S'„ and concentration of measure tech¬ 
niques. To further generalize their results, we observe that it is important to use a 
Markov chain which does not change too many matrix entries all at once and whose 
spectral gap is known. Instead of looking at a Markov chain on Sn, we hrst consider 
a Markov chain on the special orthogonal group SO{n). SO{n) is the group of n x n 
orthogonal matrices with determinant 1. As a linear transformation, every element of 
SO in) is a rotation and preserves distances. We introduce Kac’s walk on SO{n) and 
demonstrate that it is sufficiently similar to the transposition Markov chain to allow 
for Chatterjee and Ledoux’s results to carry over to the more general case of operator 
compressions. It should be noted that a similar result has been proved by Meckes and 
Meckes {Meckes and Meckes, 20ll using different techniques. In a more recent work 
Meckes and Meckes, 201^, Meckes and Meckes have extended their techniques to 
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include several other classes of random matrices and prove almost sure convergence 
of the empirical spectral measure. The purpose of this paper is to highlight the fact 
that the methods of Chatterjee and Ledoux can be extended to include more general 
cases, provided the Markov chain used satisfies appropriate conditions. To emphasize 
this point, we also apply the method to get a concentration of measure result for a 
compression by a matrix of Gaussians using Kac’s walk coupled to a thermostat. We 
also show an application of this method applied to the length of the longest increasing 
subsequence of a random walk evolving under the asymmetric exclusion process. The 


results of this section can be found in Ng and Walters, 2014 


Following the notation of Chatterjee and Ledoux, for a given Hermitian matrix A 
of order n, with eigenvalues given by Ai,..., A„, we let Fa denote the empirical spec¬ 
tral distribution function of A. This is defined as 


Fa(x) 


#{i : A^ < a;} 
n 


3.2 Kac’s Walk on SO{n) 


The following model, introduced by Kac Kac, 1954 , describes a system of particles 
evolving under a random collision mechanism such that the total energy of the system 
is conserved. Given a system of n particles in one dimension, the state of the system 
is specified by F = (ui,... Un), the velocities of the particles. At a time step f, i and j 
are chosen uniformly at random from {1,..., n} and 9 is chosen uniformly at random 
on (—vr, vr]. The i and j correspond to a collision between particles i and j such that 
the energy, 

n 

k=l 


is conserved. Under this constraint, after a collision, the new velocities will be of the 
form = Vi cos(6*) -|- Vj sin(6*) and n"®’" = Vj cos(6*) — njsin(6*). For i < j, let Rij{9) 
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be the rotation matrix given by: 


R,j(e) 


// \ 

cos(6*) sin(6*) 

/ 

— sin(6*) 003(0) 

b 


where the cos(0) and sin(0) terms are in the rows and columns labeled i and j, and the 
I denote identity matrices of different sizes (possibly 0). We will use the convention 
that RiiO = I. After one step of the process, Vnew = Rij{6)v. 

In our case, we will be considering this process acting on SO{n), so instead of 
vectors in our states will be given by matrices G G SO{n). Then we can dehne 
the one-step Markov transition operator for Kac’s walk, Q, on continuous functions 
of SO{n): 

1 1 

Qf(G) = 7^J2 m.Ai>)G)7r‘‘o ( 31 ) 

V2j i<j R 

for any G G SO{n), and where / is a continuous function on SO{n). Notice that this 
is a slightly different setup than we introduced in Chapter 2. Instead of a hnite state 
space Markov chain, we now have an inhnite state space. We will pause to discuss the 
differences between our previous case and this case. Since our state space is inhnite, 
we cannot dehne our transition probabilities using hnite dimensional matrices. We 
instead dehne a Markov transition operator on continuous functions of our space. In 
the context of our earlier discussion from before. 


Qf\G) = E(/(Xi) I Xo = G) 


In other words, Qf{G) gives us the expected value after one step of the chain, condi¬ 
tioned on the fact that we start at G G SO in). It turns out that this fully specihes 
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our Markov chain. In order to generalize the methods of Chatterjee and Ledoux to 
this case, we need to know the invariant distribution and the spectral gap of Kac’s 
walk. This is given in the following result. 


Theorem 3.2.1 ( [Carlen et al, 20*^ Maslen, 2003| ). Kac’s walk on SO{n) is ergodic 
and its invariant distribution is the uniform distribution on SO{n). Furthermore, the 
spectral gap of Kac’s walk on SO{n) is 2 (^zfp^- 


Using our Markov transition operator, we can dehne the Dirichlet form, Q{-, ■). 
As discussed in chapter 2, it is well known that for a Markov chain with spectral gap, 
Al, the Poincare inequality holds: 


AiVar(/) < Q (/, /). 


For the Kac’s walk, we have 


a (/./) = 


2© 


E 


f27r 


— / (/(G) - /(%(0)G))'d/i„(G)d0, 


l<2</<n' 


27r 


'SO{n) 


where is the Haar measure on SO{n) normalized so that the total measure is 1. 
Let us dehne the triple norm: 


TUa sup 

2 ( 2 ) 


E 


r-27r 


\f(G)-fiR.,{9)G)fd9. 


2 n 


(3.2) 


The following result is analogous to Theorem 3.3 from Ledoux’s Concentration of 
Measure Phenomenon book Ledoux, 2005| (discussed and proved in chapter 2) . We 
reproduce the proof of the theorem here to verify that even though our situation does 
not satisfy the conditions of the theorem, the exact same argument carries through 
for Kac’s walk on SO{n). We omit some details here as they are the same as the 
argument in chapter 2. 
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Theorem 3.2.2. Consider Kac’s walk on SO{n), and let F : SO{n) M. be given 
such that |||F|||oo < 1- Then, F is integrable with respect to fin, and for every r >0, 


fJ^n(^F> j Fdfin + 


where Ai = 2 (n-i)n spectral gap of Kac’s walk on SO{n). 


Proof. We first demonstrate that Q jgQf^^.^e^^^^ldfin{G) by 

using symmetry (see chapter 2 for details). 


g(,AF/2_^AF/2)_ 1 ^ 

2 ( 2 ) ^^JsO{n) 

= A E t'ff 

Uj JF{G)>F(R,,{e)G) 

\2 1 j'2-K 1 p 

V / — / {F{G) - F{R,^{e)G)f e^^^^UpniG)de 

^ I- 27r Jso{n) 


< 


4 20 ^ 

^\2} \<i<j<n 


4 


Omi 


' SO{n) 


dfin{G) 


Setting A(A) = e^^(^^dfin(G), we combine this with the Poincare inequality to 

obtain 


A,Var(e"''''") = A, (a(A) - < Q < )!|||F|||J^A(A). 

Incorporating the assumption |||F|||oo < 1 yields 

A(A) < -^A^(A/2). 

^ 4Ai 

Iterating the inequality n times gives 

( 2^ 

--t_j a2"(A/2"). 

1 4fc+iAi / 
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Since A(A) = 1 + o(A), we see that A^"(A/2"') —>■ 1 as n —)■ oo. This gives the upper 
bound 

oo / 

k=0 \ 4'“+iAi 



By plugging in A = a/Ai, using the crude estimate H^o ( ^ i ) <3, and apply- 

\ 4*: + ! / 

ing Chebyshev’s inequality (similarly to as in chapter 2), we obtain the result. □ 


3.3 Main Result 


Using these results, along with the method of Chatterjee and Ledoux, we are able to 
prove the following result: 


Theorem 3.3.1. Take any 1 < k < n and an n-dimensional Hermitian matrix G. 
Let A he the k x k matrix consisting of the first k rows and k columns of the matrix 
obtained by conjugating G by a rotation matrix Rfj G SO{n) chosen uniformly at 
random. If we let F he the expected spectral distribution of A, then for each r > 0, 


P 



A’lloo > 



< 12A/fcexp 



Proof. The proof of this theorem uses the method introduced by Chatterjee and 


Ledoux Chatterjee and Ledoux, 2009 with appropriate changes made to apply to 
the situation we are considering. 


Let RijiO) G SO(n) and let A be as stated above. Note that since A is a compression 
of a Hermitian operator, it will also be Hermitian. Fix x G M. Let f{A) := FA(x), 
where Fa{x) is the empirical spectral distribution of A. Let Q be the transition oper¬ 
ator as defined in (1) and let |||.|||oo be as in (2). Using Lemma 2.2 from Bai, 1999 
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we know that for any two Hermitian matrices A and B of order k, 


\Fa 


I oo ^ 


rank(yl — B) 
k 


In our case, taking one step in Kac’s walk is equivalent to rotation in a random 
plane by a random angle. Hence A and RijA will differ in at most two rows and two 
columns, bounding the difference in rank by 2, so 


Using (2), 

iii/iiiL-^U sup 5^ EinA) - mi^A)]^ 

2(J -4eso(„) 

2 \k J \n J kn 

where the ^ comes from the probability that both i and j are greater than k, in 
which case, A and RfjA will be the same. From Theorems 2.1 and 2.2, we have that 


P(|F^(a;) — F{x)\ > r) < 6exp 

= 6exp < 6exp (^-r/2^'^ 

This is true for any x. Now, if we let Fa{x—) ;= F Aiy) i then we have 

KFa{x—) = F{y) = F{x—). Hence, for r > 0, 



P(|FA(a;-) - EFa{x-)\ > r) < limP(|FA(i/) - F{y)\ > r) 


< 6 exp 
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This holds for all r, so we can replace > by >. Next we will £x £ G Zi> 2 . For 1 < i < £, 
let 


ti := inf{a; : F{x) > i/i} 


and to = —oo, tg = oo. Then for each i, F(ti+i) — F(ti) < !/£. Let 


A 


(max |Fa(L) - F(ti)l) A (max {FAiU-) - F(t—)|) 

1<^<T 1 < 1<1 


Take any a; G M. Let i be an index where ti < x < tj+i. Then 


Fa{x) < FA(ti+i—) < F(ti^i—) + A < F(x) + l/£ + A 


and 


Fa{x) > FAiti) > F{ti) - A > F{x) - l/£ - A 


Using these two facts, we get that 


\\Fa-F\\^ <l/i + A 


Then for any r > 0, we have 


P(||F^ - F|U > 1 /i + r)< 12{i - 1) exp 



Letting i = + 1, we have 


F{\\Fa - F||oo > -^+r) < 12v^exp 



which concludes the proof of our theorem. 


□ 
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3.4 Kac’s Model Coupled to a Thermostat 


Using a spectral gap result from [Bonetto et al, 2014 , we are able to demonstrate the 
application of this method to a more complicated Markov chain. In this system, the 
particles from Kac’s system interact amongst themselves with a rate A and interact 
with a particle from a thermostat with rate /i. The particles in the thermostat are 
Gaussian with variance so they have already reached equilibrium. The Markov 
transition operator for Kac’s walk is dehned as in (1) and the Markov transition 
operator for the thermostat is given by 


1^1 ^27r p jo 

= e-^<f''‘^f(V,(0,u)G)d0dM (3.3) 

^ 27r Jo Jr" V 271 


where u = {ui,uj 2 , ■ ■ ■ ,ujn), sends each element gij in column j to gijCos{6) + 


oji sin(6*) for i = 1 to n and 07*^ = —g^j sin(6') -\-oji cos(6'). In Bonetto et al, 2014 , they 
consider the Markov chain acting on a vector. We consider the Markov chain acting 
on a matrix by treating the matrix as n independent vectors. Using this adaption, 
the following theorem follows immediately from the results proved in [Bonetto et al, 


2014 


Theorem 3.4.1. Kac’s walk coupled to a thermostat has unique invariant measure 
given by 

^n = \{ 

hj 



and has spectral gap ^ 


For the thermostat alone (letting A = 0), we can again prove a theorem analogous 
to Chatterjee and Ledoux’s theorem 3.3. Let Q be the set of n x n matrices with 
independent and identically distributed 7V(0, 1/(3) entries. We can dehne the Dirichlet 
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form and the triple norm for the thermostat as 


^ /*27r 


a(/./) = ^h.^ 

i=i 


'Geg 


n \ n/2 

e-^<ifiV,ie,w))G - f{G))diyndwde 


1^1 


OO = sup 

Gee 2n 2 tt 




/O \ n/2 

e-‘^<\f{V,(l),w))G - f{G)\^dwdfl 


(3.4) 


Using these, we can prove a concentration of measure result for the thermostat anal¬ 
ogous to Theorem 3.2.2 

Theorem 3.4.2. Consider the Gaussian thermostat and let F ■. Q he such that 
lll-^llloo < 1- Then F is integrable with respect to and for every r >0, 

Un{F > Fdun + r)< 3e-"^/2 


where ^ is the spectral gap of the thermostat process. 


We omit the proof here as it is similar to the proof of Theorem 3.2.2. 


Using this result and Theorem 3.4.1, we can prove the following concentration of 
measure inequality. 

Theorem 3.4.3. Take any 1 < k < n and an n-dimensional Hermitian matrix G. 
Let S be an nx k matrix whose k columns are the first k columns of a random matrix 
with distribution z/„. Let A be the k x k matrix obtained by conjugating G by S. 
Letting F denote the expected spectral distribution of A, then for each r > 0, 

P(||Fa - F||oo > -^ + r) < 12v^exp 

Vk 

where p, is the rate of the interaction with the thermostat. 
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Proof. The proof of this theorem closely follows the proof of Theorem 3.3.1, with 
appropriate changes made. Let A be stated as above, and let A! be A after one 
step of the Markov chain. Fix x G M and let /(x) = Fa{x), where where Fa is the 
empirical spectral distribution of A. Notice that rank(yl — A') < 3, since after one 
step of the chain, at most 3 columns of A will be changed (two from Kac’s Walk, and 
one from the thermostat). Again using the inequality from |Bai, 1999 , we know that 


11/(4) -/(A')||o„<^ 


2© 


— sup 
n A 


E 


n 




/\\2 


E\f{A)-f{A')\ 


l<i<j<n k=l 

where the hrst sum is over possible interactions in Kac’s process and the second is 
over possible particle interactions with the thermostat. The above is 


1 /dV /3A;\ _ 27 
~ 2 \k) \n/ 2 kn 


Using theorems 3.4.1 and 3.4.2, we have that 


P(|Fa(x) - F(x)| >r)< 6exp ( 



Following the rest of the proof in 3.2.1 (with the appropriate numbers changed), we 
get 

P(||Fa-F||oo >^ + r)< uVkexp 


□ 
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3.5 An Additional Application: The Length of the 
Longest Increasing Subsequence of a Random 
Walk Evolving under the Asymmetric Exclu¬ 
sion Process 


Consider a random walk X on {1,... ,n}. Represent X by some element in {0, !}"■, 
where Xi = 0 corresponds to a step down in the walk at position i and Xi = 1 
corresponds to a step np. We will assnme that 




2=1 


n 

2 


so that we have the same nnmber of np steps as down steps. We can now look 
at this random walk as the initial conhguration of a particle process with Xj = 1 
corresponding to a particle in position i and Xj = 0 corresponding to no particle at 
position i. Consider the asymmetric exclnsion process acting on this conhgnration 
with the following dynamics. At each step of the process, a nnmber i is chosen 
nniformly in {1,..., n — 1}. If Xj = Xj+i, then the conhgnration stays the same. If 
Xj = 1 and Xj+i = 0, then the valnes of Xj and Xj+i switch with probability 1 — q/2 
and if Xj = 0 and Xj+i = 1, then the valnes switch with probability q/2. Viewed in 
this way, the asymmetric exclusion process can be viewed as a Markov process on the 


set of random walks. See Liggett, 1985 for an in depth discussion of the asymmetric 
exclusion process. 


Theorem 3.5.1 ( jKoma and Nachtergaele, 1997 , Alcaraz, 1994 , Caputo and Mar- 
tinelli, 2003] ). The spectral gap of the ASEP is A„ = 1 — cos(7r/n), where 
A = — for a parameter q satisfying 0 < q < 1. 


In our case, take q = 1 — c/n°‘, for a constant c, and 0 < a < 1, such that 
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q ^ e . Then Taylor approximating and simplifying gives 

\n = cV2n2“ 

Now let Mx denote the height of the midpoint of the random walk at a hxed time 
during the process. In other words, Mx = Xn/ 2 , assuming n is even. Note that the 
range of this function is [—n/2,n/2]. Let M’^ be the evolution of after one step 
of the process. Notice that 

since switching the position of two adjacent particles can change the height of the 
midpoint by at most 1. Then 

|||A/||^ = tmaxE(A4-Af')" 

< 1 ( 1)2 ^- 

- 2^ ^ V«-l/ 2(n-l) 

The appears because the only choice of i that will effect the midpoint is i = n/2. 


Now plugging into the Chatterjee Ledoux theorem, we have the following result. 

Theorem 3.5.2. Letting Mx denote the height of the midpoint of the random walk 
after evolution under the asymmetric exclusion process, for allr > 0 and q = 1 —c/n", 


P(|Mx — EMxl > r) < 6 exp —r/2^ 


c2/2n' 


2a 


l/(2(n - 1)) 


= 6 exp —r/2 


c^{n — 1 ) 


n 


2a 


Notice that this implies that the height of the midpoint has fluctuations bounded 
above by a constant for 0 < a < 1. 

Consider the length of the longest increasing (non-decreasing) subsequence of the 
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Possible Longest Increasing Subsequence of a Random Walk 



Figure 3.1: A longest increasing subsequence of a random walk 
random walk. This is defined as 


Lx = max{fc : < *2 < • • • < 4 and Xi^ < Xi^ < ■ ■ ■ < 


See Angel et al, 2014 for a more in depth description of this topic and resnlts for the 


simple random walk. Notice that the height of the midpoint gives a lower bonnd on 
the length of the longest increasing snbseqnence. Using ASEP as onr Markov process 
and the spectral gap above, we can prove concentration of measnre for Lx- Notice 
that switching the position of two adjacent particles via ASEP can only change Lx 
by at most 1. As before, let X' be the evolntion of X after one step of the process. 
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Then, bounding the probability above by 1, we have 


= - niaxE(Lx — Lx' 

2 X 


< 



1 

2 


so plugging into the Chatterjee Ledoux formula, we get the following result. 


Theorem 3.5.3. Letting Lx denote the length of the longest increasing subseguence 
of the random walk after evolution under the asymmetric exclusion process, for all 
r > 0 and q = 1 — c/n'^, 


P(|Lx — ETx| > r) < 6 exp 



This implies that the fluctuations are bounded above by a constant times n“. In 
particular, for g = 1 — c/^n, the fluctuations are bounded above by a constant times 


In order to give some context to the size of the fluctuations, we calculate height 
of the midpoint, which gives a lower bound on the length of the longest increasing 
subsequence of the walk under this distribution. 

Theorem 3.5.4. For q < 1 — cfn and c = —201og(3/5), the height of the midpoint 
of the random walk is kn for some constant k > 0. 

Before we give the proof, we will need the following lemma. 

Lemma 3.5.5. Consider a random walk with independent steps. Assume that P(Xfc = 
0) = and P(Xfc = 1) = for some a > 0, g G (0,1) and k G Z+. Consider 
Nx = X]r=i gives us the number of up steps in our random walk, or equiva¬ 

lently, the number of particles in our particle process. The fluctuations of Nx are at 
most order y/n. 
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Proof. We begin by calculating the variance of Nx- We can then use Chebyshev’s 
inequality to bound the fluctuations. Since the Xi are independent, 


Var(7V^) = 5^Var(W) 


i=l 


Using the probabilities given in the lemma, we know that 


Var(W) = 


aq 


aq^ + 1 \ag* + 1 


aq 


aq 


aq 


ag* + 1 \ ag* + 1 


This gives 


Var(7V^) = ^ 


aq 


aq 


“ ag* + 1 \ ag* + 1 ^ 

A derivative calculation show that — aq^+i ^ decreasing in i, so 


Var(7Vx) < n 


aq 


aq + 1 


aq 


aq + 1 


Since we only care about the order of the fluctuations, we can bound the positive 
value 


aq 


aq 


by 1, giving us 


ag + 1 / \ ag + 1 

Var(7Vx) < n 


Plugging into Chebyshev’s inequality tells us that 


n 


Fi\Nx-EiNx)\>k)<- 


which proves our result. 
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□ 


We are now set to prove theorem 3.5.4 


Proof. The basic idea of the proof of theorem 3.5.4 is as follows. We will begin by 
assuming that the steps of our random walk are independent, so that our measure is 
a product measure. Recall, the steps are not independent, since we are conditioning 
on the fact that we have exactly n/2 steps up and n/2 steps down. However, if n is 
large, the steps are close to independent. By bounding the fluctuations of the number 
of particles in our product system, we can then relate our non-independent state to 
the product state. 

Begin by assuming that 

P(Xt = 1 ) ’ 

SO that we have a product measure. Then we know that 


PiX, = 0 ) = 


aq^ + 1 


and 


Then 


P{Xk = 1 ) = 


aq 


aq^ + 1 


E E-’f- =E 


aq 


, ^ aq^ -|- 1 

j=i / i=i ^ 

Since the summand is decreasing in z, we get the bounds 


aq 


aq^ + 1 


< E ( ^ W j <k 


. i=l 


aq 


aq + l 


We will work in this generality for now, and add in appropriate values of a and k 
later. Using this information, we can get bounds on the height of the random walk 
at point k. Let be the height of the random walk at position k. For convenience 
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later, we will assume that Xj = 1 corresponds to a step down in the walk, and that 
Xj = 0 corresponds to a step up. Provided that we can prove that our height is cn 
for c < 0, our theorem will be proved. We have 

= k-2 

Plugging in our bounds on E 



E(//fc) = (-1)^X,+ U-^X, 


i=l 


2=1 


-k 2 


aq 


aq + 1 


1 < E{Hk) <-k{2 


aq 


aq^ + 1 


At this point, we need a bound on the number of particles in the system. Since we 
are assuming the Xj are independent, we can use the result from the previous lemma, 
which gives us 


P 


n 


X - M 


> u 


< 4exp(— m^/4M) 


\|i=i I / 

where M is a median for the number of particles. Estimating the median by the 
expectation of the number of particles, we see that M should at least be close to 
If we choose a appropriately corresponding to q, we should be able to 
make the constant order 1, making our expectation order n. Then, by the concen¬ 
tration of measure inequality, fluctuations on the order of ^/n. This is 

reasonably small compared with the expected number of particles in the system. 

Recall that we are actually concerned with Ending the height of the midpoint, so 
plugging in /c = n/2, we have that 


-n/2 2 


aq 


aq + 1 


< HHn/ 2 ) < -n/2 (^2 


aq 


nl2 


aqn/2 _|_ I 


At this point, we can ignore the lower bound, using the fact that that a lower bound 
is —n/2 anyway, regardless of the configuration. We will refer to our interface as the 
position in which P(X = 0) = P(X = 1). For now, we will put our interface at 9n/20, 
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which will be just to the left of the midpoint. In other words, a = and at 

position 9n/20, P(X = 0) = P(X = 1). We will push it to the edge at n/2 at the end, 
since moving the interface to the right only increases the probability of more Xi being 
equal to 1, hence lowering the expectation of the midpoint. Using this interface, we 
will hrst look at the height of the random walk at position 8n/20. Using the upper 
bound from above, we have that 

s y (■ (iS^r) - ■) 


Beyond this point, if we assume that all of the remaining steps between 8n/20 and 
n/2 are steps up, we have that 


E(i/n/2) < 


—8n 


\^g-n/20 + ly 



+ 


2n 


The important thing to notice here, is this actually gives us an upper bound on the 
height of the midpoint in the hxed particle number (ASEP) random walk. In the 
product state conhguration, with our interface at |^, we know that the fluctuations 
in the number of down steps are less than By assuming that all steps after site 
1^ are up, we have accounted for the worst case scenario where we actually have ^/n 
less down steps then we expect. If some of the steps after site are actually down 
instead of up, this will only serve to lower the height of our midpoint. Hence, we 
have, that in the ASEP (hxed number of down steps) random walk generated using 
the blocking measures, 

^ (2 - l) + I 


We would like to show that for an appropriate choice of g, this is cn for some constant 
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c < 0. This is true provided that 


8 


(2 ( ^ 

V V?—/20 + ly^ 



2 



Solving this inequality gives a condition on q, which is 



or 

q > 

Taylor expanding the exponential gives 

g > 1 + ^ log(3/5) + ^(log(3/5))^ + ... 

As n —)■ cx), taking g > 1 — a/n with a = —201og(3/5) should be sufficient. As long 
as this condition is satished, our expectation is cn for a constant c < 0. 

At this point, we do want to move the interface to a = such that P(X „/2 = 

0) = P(X„/2 = 1). This simply increases our probability of down steps between 
and Since adding extra down steps only decreases the expectation of the height of 
the midpoint, the theorem is proved. □ 


3.6 Remarks 

By generalizing this method introduced by Chatterjee and Ledoux, we are able to 
show concentration of measure of the empirical spectral distribution not only for 
operator compressions via SO{n) but also for operators that are ’’compressed” by 
conjugation with a Gaussian matrix. It is likely that this method could be applied 
to a much wider range of Markov chains, given that the chain does not change too 
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many entries at once, has an appropriate invariant distribution, and for which the 
spectral gap is known. It is possible that better bounds for the Gaussian compression 
could be obtained by adapting the method to use the ’’second” spectral gap or the 
exponential decay rate in relative entropy found in Bonetto et al, 2014] . 


It is worth noting that Talagrand’s isoperimetric inequality Talagrand, 1995 gives 
concentration of measure for the length of the longest increasing subsequence for ran¬ 
dom permutations, but it cannot be used in the context of this ASEP random walk, 
as it requires independence. Using Chatterjee and Ledoux’s method, independence is 
not needed. We only need a spectral gap bound for the Markov chain. 
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4 Mixed Matrix Moments and 
Eigenvector Overlap Functions 
of the Ginibre Ensemble 


The purpose of this section is to make some observations about the mixed matrix 
moments for non-Hermitian random matrices. The results in this chapter can be 


found in Walters and Starr, 2015 . Let Mat„(C) denote the set of n x n matrices 


with complex entries. We use this notation here because we will use Mn for something 
else later. 


The model we will focus on most is the complex Ginibre ensemble, given by 


An e Matn(C), An = (a^O', fc))yfc=i, an(j, k) = 


X{3,k)+iY{3,k) 

a/^ 


(4.1) 


where (W(j, are IID, ^(0,1) real random variables. 

Much of what we will say has already been explored by Chalker and Mehlig in a 
pair of papers [Chalker and Mehlig, 1998 Mehlig and Chalker, 2000 , in particular, 
in their dehnition of expected overlap functions. There are other models of inter¬ 
est which were explored by Fyodorov and coauthors [Fyodorov and Mehlig, 2002 


Fyodorov and Sommers, 2003 , for which one can obtain more explicit formulas for 


the expected overlap functions. Our main emphasis will be to relate Chalker and 
Mehlig’s formulas for the overlap functions of the complex Ginibre ensemble to the 
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mixed matrix moments. 

Our motivation in considering this problem is the following. There is a rough anal¬ 
ogy between mean-field spin glasses and random matrices, as far as the mathematical 
methods are concerned. We indicate this in the table in Figure [4Tj We will give more 
details and references in a later discussion, but we would like to point out some of 
the analogies now. This analogy leads to a method to calculate moments, but there 
is still the question about how to relate the moments to the spectral information for 
the matrix. 

Random Matrices Spin Glasses 

expectations of moments expectations of products of overlaps 

recurrence relation for moments stochastic stability equations: Ghirlanda-Guerra identities 

formula for Stieltjes transform of limiting law proof of Parisi’s ultrametric ansatz 


Figure 4.1: Some analogous elements in random matrix and spin glass theory. (Proofs may differ 
considerably.) 


Although the main subject of this subject is random matrices, we will give a very 
brief introduction to spin glasses, just to motivate our analogy. Spin glasses are phys¬ 
ical objects. We will not say much about the physics behind them, as the subject of 
this paper is mathematics. However, we will give a quote from Daniel Mattis’s book 


Mattis, 2004 in his discussion of dilute magnetic alloys. He says : 


”If the impurity atom does possess a magnetic moment this polarizes the conduction 
electrons in its vicinity by means of the exchange interaction and thereby influences 
the spin orientation of a second magnetic atom at some distance. Owing to guantum 
oscillations in the conduction electrons spin polarization the resulting effective inter¬ 
action between two magnetic impurities at some distance apart can be ferromagnetic 
(tending to align their spins) or antiferromagnetic (tending to align them in opposite 
directions). Thus a given magnetic impurity is subject to a variety of ferromagnetic 
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and antiferromagnetic interactions with the various neighboring impurities. What is 
the state of lowest energy of such a system? This is the topic of an active field of 


studies entitled spin glasses, the magnetic analog to an amorphous solid. ” Mattis, 
2004] (p. 48) 


Since this is a mathematics paper, we will consider a spin glass as a probabilistic 
model. We can consider a system 




for a large integer n. We call an element a G a conhguration. The components of 
a are called spins (and can each take the value either ±1). The energy of the system 
in a conhguration a is called the Hamiltonian, which is usually denoted HN{a). Given 
a parameter f3 (the inverse temperature), we can dehne the Gibbs measure by 

exp(-/3ifjv(cr)) 

Zn 


Giv({a}) 


where Zn is a. normalizing factor, called the partition function. The Gibbs measure is 
a probability measure which represents the probability of observing the conhguration 
a after the system has reached equilibrium in a heat bath at temperature 1/13. Hn{(t) 
relates to the interactions between the spins. In the models that are often consid¬ 
ered, the Hn{(t) are random variables. For a given the main problem is to 

understand the Gibbs measure. See [Talagrand, 2003 for a more in depth discussion 
of the probabilistic aspect of spin glasses. 


We will depart from our discussion of spin glasses now, to begin the discussion 
of random matrices. The analogies between the two topics will be discussed more in 
depth later. 


We will start by briehy recalling the formula for the mixed matrix moments of the 
complex Ginibre ensemble, and we will emphasize the relation to spin glass techniques. 
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This formula is already known and we will give references. 

In later sections, we will describe the relationship between the mixed matrix mo¬ 
ments and the expected overlap functions of Chalker and Mehlig. This leads to some 
new problems. 


4.1 Mixed Matrix Moments 

Given any n x n matrix A, any positive integer k, and any nonnegative integers 
p(l), g(l),... ,p{k),q{k), we may dehne 

M„(piq) = i ... as<'='(a;)''<'=|] , (4.2) 

for p = (p(l),... ,p{k)), q = (g(l),... ,q{k)). Notice that Mq = 1. As an example, 
consider 

M4(2,2);(2,2)) = 

1 ” 

^ ^ J2)®n(j2)J3)®n(j4)J3)®n(j5)J4)®n(j5)J6)®n(j6)J7)®n(j8)Js)• 

n 

(4.3) 

If we consider the Ginibre ensemble and let an{j,k) = {X{j,k) + iY{j,k))/\/^ as 
before, then we have 

E[an{j,k)an{f,k')] = E[an{j,k)an{f ,k')],= 0 and E[a„(j, A;)a„(j',/c')] = n~^SjjAk,k'■ 

(4.4) 

Recall that Wick’s rule says that for mean 0 Gaussian random variables Xi,..., X„, 


E(XiX2...X„) = 5^J](WX,) 

ip 
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where the sum is over all distinct ways of dividing 1,..., n into pairs. Using this, and 
dehning mn(p,q) = E[M„(p,q)], gives us: 

"in(p,q) = E[M„(p',q')M„(p",q")] , (4.5) 

{p',q',p",q")e5(p,q) 

where iS(p,q) is the set of all admissible pairs, which we describe now. Let R = 

p{l) + ■ ■ ■ + p{k) + g(l) H-h q{k), and dehne a = (a(l),..., (t{R)) G {+1, -1}^ 

as (j = (—..., (—1)'^*^^)) viewed as spins on vertices arranged 

on a circle. We will sometimes denote this as Up^q. Let S(p,q) denote pairs (a', a") 
as follows. We match up the hrst +1 and any —1. Where these two are removed, we 
pinch the circle into two smaller circles. Then the remaining spins on the two smaller 
circles comprise a' and a”. E.g., for a particular example 

a = ( +1 , +1, —1, +1, +1, —1, —1) eG {a', a") = ((+1), (—1, +1, +1, —1, —1)). 

(4.6) 

The set S(p, q) is the set of all possible pairs {a', a") obtainable in this way. We then 
dehne iS(p,q) to be the set of all pairs (p',q') and (p'^q") by mapping backwards 
S(p, q) from a' and a”, this way. 

Using this, we wish to give the main ideas of the proof of the following theorem. 
Theorem 4.1.1. For any k and any p,q, we have 

lim mn(p,q) = m(p, q), 

n^oo 

where m(p, q) is as follows. Let Cr denote the number of all non-crossing matchings 
of R vertices on a circle (Catalan’s number). Let m{p,q) denote the cardinality of all 
such matchings satisfying the following constraint: assigning spins to the R vertices 
by o'p.q, each edge has two endpoints with one +1 spin and one —1 spin. 

As an example, m((2,2); (2,2)) = 3 where the matchings are indicated diagram- 
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Theorem 4.1.1 is a well-known result. We refer to Kemp et ai, 2011 for a discussion 


We will motivate a proof of this result, without including all details, here. Our reason 
is that we actually want to use this result to motivate the discussion of random 
matrices and spin glasses further, which we indicated earlier. 


4.1.1 Argument for the Proof of the Mixed Matrix Moments 


The hrst step in the argument for the proof of Theorem 4.1.1 is to use concentration 


of measure (COM) to replace (4.5) with a nonlinear recurrence relation. Here what 


we mean is non-linearity in the probability measure for the random entries of the 
matrix. Since the expectation is linear, what we really mean is to obtain a product 
of two expectations. If M„(p',q') and M„(p",q") were independent, then we could 
replace the expectation by a product, but they are not exactly independent. Instead, 
they satisfy COM, which means that they are approximately non-random. And, of 
course, non-random variables are exactly independent of every other random variable 
(as well as themselves). 

The easiest version of COM is just L^-concentration. For example, the following 
lemma is very easy to prove: 


Lemma 4.1.2. Suppose / : M” —)■ M is a function such that 

IIV/IIL = sup^eR- ELi (It W) Then z/ Hi, ..., Un, W, ..., 14 are 


IID 
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Af{0, 1) random variables then 

E[(/(U)-/(V))^] < 2||V/£. (4.7) 


This can be proved using the basic, but important method of “quadratic interpo¬ 
lation,” which is sometimes called the “smart path method” by some mathematicians 
working on spin glasses. 


Proof. Let Z = (Zi,..., Zn) be an IID A/'(0,1) vector, independent of U and V. 
Then define U(6') = sin(6*) U -1- cos(6*) Z and V(6*) = sin(6*) V -|- cos(6') Z. Then 
^U(d) = U(d + |), and E[U(6')U(6' -|- |)] = 0. This means that U(6') is statistically 
independent of its ^-derivative. Similar results hold for V(6'). On the other hand 
E[V(0)U(0 + !)] = - sin(0) cos(0). 

Next, using the fundamental theorem of calculus. 


E [(/(U) - f(V) f] 


d 

—E 
de 


\ 2 


/(U(0))-/(V(0))j 


de 


(4.8) 


and an easy calculation using Gaussian integration by parts (and the covariance 
formulas mentioned above) shows that 


d 


E 


(/(U(9))-/(V(9)))' 


2sin(29)E V/(U(9)) ■ V/(V(9)) 


(4.9) 


Then (4.7) follows by using the Cauchy-Schwarz inequality. 


□ 


This is only the simplest Gaussian GOM result. Notice that the method of proof is 
similar to the method used to proved Talagrand’s Gaussian concentration of measure 


inequality for Lipschitz functions as stated in chapter 2 Theorem 2.1.1 


This lemma is a tool which can be applied to show that the various mixed matrix 
moments Mn{p, q) do satisfy GOM. We present this lemma here, because it is easier 
to obtain concentration of measure for the matrix moments using this lemma than 
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with Theorem 2.1.1 It should be noted, that 2.1.1 will also work in this case and 


will give a sharper concentration bound. Either way, it is an interesting calculation, 
and much of the combinatorics, especially involving matchings related to Catalan’s 
number, are hrst visible in the grad-squared calculation. 

Since the goal of this section is to give a general outline of the proof of the formula 
for the mixed matrix moments and relate it to spin glass techniques, we will just state 
that the desired concentration of measure result is true. 


Then we are able to boost (4.5) to 


limmn(p,q)- V] m„(p',q')m„(p",q") = 0. (4.10) 

(p',q',p",q")e5(p,q) 

Another easy fact is that, due to symmetry, m„(p, q) = 0 unless p(l) p{k) = 

g(l) q{k). And, of course, mo = 1. 


Using this, and the method of induction, one can then prove Theorem 4.1.1 


4.1.2 Commentary on Proof Technique 


The quadratic interpolation technique is important in spin glasses. The hrst major 


use was by Guerra and Toninelli Guerra and Toninelli, 2002 and Guerra Guerra, 


2003 . It is called the “smart path method” by Talagrand Talagrand, 2011 . This is 


the method which we used to prove Talagrand’s Gaussian concentration of measure 
inequality in chapter 2. 

Using Wick’s rule to obtain a recurrence relation is important in many subjects. 
It is a standard approach to determining moments of random matrices. See, for 


instance, Anderson et al, 2010 , chapter 1. In the context of Gaussian spin glasses. 


this technique combined with stochastic stability leads to the Aizenman-Gontucci 


identities Aizenman and Gontucci, 1998 . When combined with concentration of 


measure it leads to the Ghirlanda-Guerra identities [Ghirlanda and Guerra, 1998 
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See, for instance, the review |Contucci and Giardina, 2007 


For random matrices, the problem of recombining the moments into usefnl infor¬ 
mation abont the limiting empirical spectral measnre is also important. For Hermi- 
tian random matrices, this is related to the classical moment method. The standard 
approach is to pnt the moments together into the Stieltjes transform, and then to 


proceed from there Pastnr, 1973 . Again, a good general reference is Anderson et a/.. 


2010 , chapter 1. 


For spin glasses, the problem of integrating the Ghirlanda-Guerra identities into 
a usefnl result for mean-held models was solved only relatively recently. Panchenko 
showed that the “extended Ghirlanda-Guerra identities” imply Parisi’s ultrametric 


ansatz Panchenko, 2011 . This is an important work. One element of his proof is 
putting various terms together into a a new exponential type generating function. 
This might be somewhat analogous to the Stieltjes transform step. But after that, 
the proofs are very different. 

For non-Hermitian random matrices, getting useful information from the moments 
is the topic we focus on next. 


4.2 The Expected Overlap Functions 


Since the moments M„(p,q) satisfy concentration of measure, one is primarily only 
interested in their expectations. The next quantity we introduce is also dehned just 
for the expectation. (Studying its distribution may be interesting, but we will not 
comment on this, here.) It is the expected overlap function of Ghalker and Mehlig, 


introduced in Ghalker and Mehlig, 1998 and further studied by them in Mehlig and 


Ghalker, 2000 . 


Given A„ G Matn(C), randomly distributed according to Ginibre’s ensemble, 
almost surely it may be diagonalized. This means that we can hnd eigenvalues 
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Ai, ..., Aji G C as well as pairs of vectors ... ,ipn,<pn ^ C” such that 

Anll^k = hlpk, 4>*kAn = h<Pl, = Sjk. (4.11) 

Using this, for any other vector G C"", there is the formula 

n 

Aln'h = (4.12) 

k=l 

These are random because they depend on An, but we may take the expectation over 
the randomness. 

Given any continuous function, /, with compact support on C, one may dehne 

r n 

‘^Af] = -E . (4.13) 

La:=1 

Similarly, given any continuous function, F, with compact support on C x C, we may 
dehne 

1 r ” 

u;A)[F] = -E 5^5^F(A,-,Afc)(^fc,^,)(0„0,) . (4.14) 

Li=l k^j 

Regularity of the eigenvalues and eigenvectors with respect to the matrix entries 
guarantees existence of functions On^ : C —)■ C and On^ : C x C —)■ C such that 



(4.15) 


Using these dehnitions, one may determine a relation between these expected over¬ 
lap functions and the correlation functions for the eigenvalues. Dehne pn^ and pn\ 
analogously to ojn'^ and as 

-i n -i n 

4‘'l/l = -E ^/(AO , and = -E ^(A,,AJ , (4,16) 

Lfc=i J Li=i kj^j 
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Then there are functions TZn^ : C —)■ C and TZn^ : C x C —)■ C such that 


Pn\f] = [ and = [ [ F{z,w)n^^\z,w)(fz(fw. 

Jc Jc Jc 

(4.17) 

Then 

0^^\z)+ [ 0^^\z,w)d^w = nl^\z). (4.18) 

Jc 

In terms of these functions, for any nonnegative integers p and q, 


(g)) 



zPz‘>0^^\z) d^z + 



(4.19) 


Therefore, the mixed matrix moments are calculable from the overlap functions. 
Moreover, the limiting values of the moments give some constraints for the limit¬ 
ing behavior of the overlap functions. It is easy to see that On\e'^^z) = On\z) and 
On\e^^z, e^^w) = On\z, w), consistent with the fact that mn{{p)', (g)) equals 0 unless 
p = q. 


4.3 Formulas for the Overlap Functions 

Chalker and Mehlig were able to relate the overlap functions to expectations of func¬ 
tions involving all the eigenvalues. The eigenvalue distribution for the complex Gini- 
bre ensemble is well-known. In fact it is one of the simplest of the various Gaussian 
ensembles. For example, as Ghalker and Mehlig also point out in their paper. 


(4.20) 
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where Di^_i{z) equals the determinant of the {N — l)-dimensional square matrix 
T>n{z) where the matrix entries are best indexed for j,k ^ {0,..., iV — 2} as 

[Dn{z)U= JjX’^\z-X\^exp{-N\X\^)d^X. (4.21) 

By rotational invariance of all the terms in the integrand other than \z — X^^, which 
is only quadratic, it happens that T>m{z) is a tridiagonal matrix. Hence, Chalker and 
Mehlig point out that it is easy to derive a recursion relation for Djq_i{z). It is easier 
to dehne a new quantity ^ z) = D]s[-i{a~^N~^^‘^z). Then they show 

Dn+i{(j~‘^,z) = {(j~‘^\z\^+ n + l)Dn{(T~‘^,z) - (j~‘^n\z\^Dn-i{(y~‘^,z), (4.22) 


and ^,z) = 1, IXi(a ^,z) = 1 + a It turns out to be easy to solve this 

recurrence relation, and Chalker and Mehlig give the formula 


D^_,{a-\z) = (iV-1)! 


n=0 


n\ 


(4.23) 


which is the partial sum for the series for {N — 1)! exp((T“^|; 2 p). In order to obtain 
D]\f_i{z) one must take = N. One sees that the dividing line is | 2 :| < 1 versus \z\ > 
1, as to whether enough terms have been included in the partial sum to get essentially 
exp(7V|;2p) or not. From this it follows that the measure TZ^^\z) (fz converges weakly 
to 7r“^l[o,i](|2:n d^z, as iV —)■ cx). The reason for going into so much detail in this 
example is that the other examples are similar, but harder. In fact, some of the 
formulas are so complicated that so far they have eluded any explicit, exact formula 
(at least as far as we have been able to hnd in the literature). 

Another easy result which follows from these explicit formulas, but which does 
not appear in the paper of Chalker and Mehlig, is the scaling formula near the unit 
circle. Let us record this for later reference. 
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Lemma 4.3.1. For any m G M, 

7r~^^{2u) where <l>(a;) = L- f e~^^^‘^dz. (4.24) 

V 27r J—oo 

Proof. Given the exact formula, 

TZ^^\l — = 7r~^ exp(—iV + N^^'^u) ^ , (4-25) 

n=0 

make the substitution n = N — for x G 2iV“^/^,..., and use 

Stirling’s formula. Then replace the sum by an appropriate integral in x (of which 
it is a Riemann sum approximation with Ax = by using the rigorous Euler- 

Maclaurin summation formula. □ 

We may note that using the Euler-Maclaurin summation formula, one may obtain 
more terms as corrections of the leading-order term, just as one does for the asymp¬ 
totic series in Stirling’s formula. Additionally, one may obtain formulas that are valid 
for more values of u: one may obtain an asymptotic formula for as¬ 
suming that |; 2 | — 1 < CN~^/‘^ for some G, and another formula for assuming 

that |x| — 1 > —for some C: the difference in being whether one chooses to 
asymptotically evaluate the terms which are present in the partial sum for exp(A|xp) 
or whether one chooses to asymptotically evaluate the terms which are absent in that 
partial sum. 

4.3.1 More Involved Formulas: 

The formula for is not much more complicated than the formula for and 
Chalker and Mehlig gave the explicit answer. It turns out that one may write 
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similarly to as 

0^^\z) = exp{-N\z\‘^)GN-i{z) where Gn-i{z) = det[^Ar_i( 2 :)] , 

Vj, fc e {0,..., iV - 2} , (4.26) 

/ Afj+k+i \ 1/2 r 

[QN-i{z)]jk = (^ ^2(j!)(fc]) j ^ X^{N-^ + |z - Ap) exp(-Ar|An d^A . 

The matrix Qn_i{z) is also tridiagonal for the same reason as Vn-i{z). In par- 
ticnlar, there is again a recursion relation for Gn_i{z). Dehning GN_i{a~'^, z) = 
GN_i{a~^N~^^‘^z), one may see the recursion formula 

Gn+i(o' ‘^,z) 

= [Gn{(Z~^,z)]nnGn{(T~^,z) - [gnicZ~‘^, z)]n,n-l[GnicZ~'^, z)]n-l,nGn-li(T~‘^, z) 

= + n + 2 )G„(ct“^, z) — a~‘^n\z\^Gn-i{a~‘^, z), 

(4.27) 


with Go{a z) = 1 and Gi{a ‘^,z) = 2 + a ‘^\z\^. 


Lemma 4.3.2. The exact solution to the recursion relation when a ^ = N is 


N-l 


Gn{z) = (iV-1)! J2{N- 


n) 


(N\z 


2\n 


n=0 


n\ 


(4.28) 


Using this formula, it is easy to see that N~^0]^{z) dPz converges weakly to 7r“^(l — 
| 2 ;p)l[o,i](| 2 ;p) (fz, which is precisely the behavior that Chalker and Mehlig found by 
other techniques. We will return to their approach, shortly. For now, let us state the 


analogue of Lemma 4.3.1 


Corollary 4.3.3. For any u G 


0^^\l - N-^/^u) 


AU/2 


TT 






- 2u^{-2u) 


as N ^ oo. 


(4.29) 
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Proof. The proof is perfectly analogous to the proof of Lemma 4.3.1, except we start 

□ 


with Lemma 4.3.2 instead of equation (4.23). 


( 2 ) 

One also needs the two point function in order to obtain any interesting 
moments. The two-point function for the eigenvalues is easier to start with since 
its distribution is known exactly. Using ideas related to the theory of orthogonal 
polynomials, one may see that [zi., Z 2 ) is determinantal. The canonical general 


reference for this is Mehta, 2004 . One may write the formula as 


Af-l 


(u,^ 2 ) = vr <let{K]S!{zjZk)f.^k=i 


n=0 


(Nzy 


n\ 


(4.30) 


From this one may determine the following asymptotics, proved in the same way as 
before. 

Lemma 4.3.4. Define C^\zi,Z 2 ) = 2 : 2 ) —the corrected 

correlation function for the eigenvalues. Then for any fixed ui,U 2 E C 




where the definition of $ is extended to the complex plane as 

poo 

$(-«) = (27r)-^/2g-nV2 / 

Jo 

( 2 ,) 

We have stated a somewhat precise limit for TZy , but we do not know how to 
( 2 ) 

get a precise limit for Oy . Let us state one of Chalker and Mehlig’s main results as 

( 2 ) 

a conjecture. In other words, they give a good argument for the calculation of Oy 
which is highly plausible on the basis of mathematical reasoning, but to the best of 
our knowledge their result has not yet been fully rigorously proved. 

Conjecture 4.3.5 (Chalker and Mehlig). (i) For any two points Zi,Z 2 such that 







(4.32) 


0^^\zi,Z2) 


N—^oo 1 



1 - 
ki - 


(ii) For any u E C and z such that \z\ < 1, 




1 

+ 


|2)g-M^ 


\uj 

4 


as TV —)■ cx). (4.33) 


Importantly, there is no asymptotic formula for zi and Z 2 near the boundary of 
the circle. For all the other cases, this regime gives lower-order corrections, beyond 
the leading order. 

Chalker and Mehlig’s approach is beautiful and compelling. They calculated an 
explicit formula for O^^\o,z). Note, for instance, that Tl^^\o,z) = tt~^{71^^\z) — 
.^-ig-pp), formula simplihes when one of the arguments is 0. A similar fact 

holds for 0^^\zi, Z 2 ), even though it seems that it is not determinantal like 7l^^\zi, 2 : 2 ). 
Then, Chalker and Mehlig considered a universality-type argument to see how the 
functional form should behave under transformations of the point 0 to other places on 
the circle. Their argument is also a universal argument, applying to more ensembles 
than just the complex Ginibre ensemble, but we will continue to consider just the 
complex Ginibre ensemble, here. 

The second part of their argument is the key to their formula. The function 
O)^ {zi,Z 2 ) may be expressed as the expectation of a non-local function of all the 
eigenvalues of A„. Chalker and Mehlig observe that the function depends mainly on 
the eigenvalues in a core small area around Zi and Z 2 - For this core, the distribution 
of the eigenvalues should be universal, not depending on the proximity of Zi and 
Z 2 to the boundary of the disk, as long as they are not near the boundary. Then 
outside the core there is a self-averaging contribution of all the other eigenvalues. 
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which may be reduced to a Riemann integral approximation, and calculated. That 
part does depend on the geometry of the point configuration in the disk, but it is 


(‘2'\ 

easily calculated. Putting these two parts together with their formula for O}^ {0,z), 


they were able to arrive at (4.33). 


The reader is advised most strongly to consult their beautiful paper. 


Now we want to explain briefly the first part of their argument since it is a basis for 
a different proposal we have for how to prove their conjecture. Chalker and Mehlig 
point out that O}^ {zi,Z 2 ) may be calculated as the determinant of a 5-diagonal 
matrix. In fact, it is easier to start with TZ)^ {zi, Z 2 ): 




ATS 

TT^m 


\zi-Z2\^e , 


(4.34) 


where Fn- 2 {zi, Z 2 ) equals the determinant of the {N — 2)-dimensional square matrix 
J^N- 2 {zi, Z 2 ), where 

lJ^N-2(^i,^2)hk = ^ i)!(fc + I)) ] y^A A^|^i-An^2-A|^exp(-iV|And^A, 

(4.35) 

for j, fc = 0,..., — 3. Then the formula for Oj^\zi, Z 2 ) is 

0<^\zu^2) = ( 4 . 36 ) 


where H]^- 2 {zi, Z 2 ) equals the determinant of the (N — 2)-dimensional square matrix 
'Fn- 2 (zi,Z 2 ), where 


\Hn-2{Zi, Z2)\jk — 


jYi+fc+e 


1/2 


7r‘^{j -|- l)!(fc -|- 1)! 

A A^ \zi — A|^|z2 — A|^ N ^ (^zi — a) (z2 — A) 


exp(—iV|A|^) d^A , (4.37) 


for j, k = 0,..., N—3. These are naturally 5-diagonal because of rotational invariance. 
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However, notice that if 2:1 = 0 or ^2 = 0 then they become tri-diagonal again. Hence, 
they are more easily calculable in that case. That is why 0\q is calculable. 

In a later section, we are going to propose another method to proceed. We will 
write down the recursion relation for the 5-diagonal matrix, which is harder than 
for a tridiagonal matrix. Then, even if the formula is not exactly solvable, we argue 
that it should be asymptotically solvable. We give more details in a later section, in 
particular carrying out the asymptotic approach for the easier problem of calculating 
{z) (which we may check against the exact solution). 


4.4 Moments and Constraints on the Overlap Func¬ 
tions 


/o\ 

An ideal situation would be to find an explict sum-formula for 0)q {zi^Z 2 )-, just as 
provides for 0'^^\z), but so far, this has not been discovered. In the next 


Lemma 


4.3.2 


section, we will suggest a rigorous approach which may work to give the asymptotics, 
even when no explicit formula is known. For now, let us state the constraints imposed 
by the moment formula from before. 


Recall from (4.19) for any nonnegative integers p and g. 


= I z^z'^0^^\z) d'^z + I I z^w'^0)^’{z,w) d‘^zd^w . 

Jc Jc Jc 




Moreover, from the discussion at the end of Section 4.1.1, mAr((p); (g)) equals 0 unless 


p = q, and as noted at the end of Section 4.2, this is already reflected in the rotational 
invariance properties of 0^^\z) and 0^^\zi, Z 2 )- Therefore, specializing, we see that 


z\‘^PO^^\z) d‘^z + / zfzlOP{zi,Z2)d‘^Zid'^Z2 = 1 , 


(4.38) 


C JC 


for each nonnegative integer p. This is the constraint formula. Let us now analyze 
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this formula, starting with the leading order terms, and going down in order. 

4.4.1 Cancelling Divergences at Leading Order 

For any hxed with \z\ < 1, we have 


^n\z) ~ ntt ^(1 - , 


(4.39) 


and the corrections are actually exponentially small in N (since they arise as the 
deep part of the right tail of the series for the exponential). Therefore, integrating, 
we obtain the leading-order part of the contribution from from the formula 

above 


,|2p^(l) 


N 


Ntt 


-1 




( 4 . 40 ) 


JC Jc 

The corrections to this formula are not exponentially small, incidentally. This is 
because the formula for (z) is not exponentially close to the exact formula for all 
2 : in the complex plane. For a hxed \z\ > 1 it is easy to see that 0^j^\z) is exponentially 
small (hence exponentially close to the approximating function of 0 there). That is 
because one only has the series for the exponential up to a small number of terms, 
deep in the left tail. Near the circle, there are algebraic corrections, not exponential 
ones. 


Nevertheless, let us note that, by making a polar decomposition, = re*®, we 
obtain 


Ntt ^ Jjz\‘^p{ 1 - \z\‘^)llo,i){\z\‘^) d^z = ^ tP{l-t)dt = (4-41) 

Let us see how this cancels with the leading-order part of the integral. 

We will use Chalker and Mehlig’s formula here for the leading-order part, even 
though we do not yet know the corrections for the lower-order part near the circle. 
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Then we get 



c JC 


Z2) (fzi (fz2 

u \p 



C JC 


z + 


2iVV2 




h; 


2iVV2 




1 

+ 


|2)e-M^ 


\uj 

4 


N-^d^ud^z, ( 4 . 42 ) 


where the N~^ associated to the volume-element (i^ca-times-ii^;^ is to account for the 
Jacobian of the transformation from (zi, Z 2 ) to (z, lj). Now we will begin to separate 
this formula into even another decomposition into leading terms, and sub-leading 
terms. This is because, in the formulas Zi = {z+\N~^/‘^ujY and 'z^ = (z— ^ 

clearly the leading order arises by ignoring the contributions of u which each are 
accompanied by negative powers of N. We really obtain, what we might call the 
“leading order, leading order” term: 



z\zlOf^\zi, Z2) d'^zi dFz2 


C JC 


-Ntt 


-2 



z\^Yl - \z\YYo,1){\z\Y 


C JC 


1 - (1 +1. 


p)e-H^ 


a; 

4 


N-^d^ud^z. ( 4 . 43 ) 


Then it is easy to see that this splits. The integral over u is 


1 

+ 


|2)g-|.P 


a; 

4 


d^U = 71 


- -^ ^ ^ {Jo ■ 

( 4 . 44 ) 


Integrating-by-parts, it is easy to see that this gives tt. Therefore, we end up with 
the exact negative of the leading order contribution by 



c JC 


zlzlO^^\zi,Z2)d^zid^Z2 - Ntt ^ / \z\^^{I - \z\YYq^i){\z\^) d?z 


N 


(p+ l)(p + 2) ■ 


( 4 . 45 ) 
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The fact that these two terms cancel is good, because each diverges, separately; 
whereas, according to the formula, the exact answer is supposed to be 1. 


4.4.2 The Sub-Leading Contribution from 

For the hrst integral, we are fortunate that the exact correction is known near the 
circle. We will not attempt to keep track of the exponentially-small corrections which 
are present away from the circle. Near the circle, the exact corrections are relevant 
because they are not exponentially small. 

Using Corollary 4.3, we know that 


\z\^PO^i\z) cfz 


N 


{p + l){p + 2) 


7VV2 


71 




— 2m<F(—2m) 


\z\‘^P(fz + o{l), (4.46) 


«=7V1/2(1_|2|) 


where the small term o(l) means that the remainder converges to 0 as iV —)■ cx). This 
remainder includes exponentially small corrections to 0^^\z) away from the circle, as 
well as the systematic correction terms to the leading-order behavior near the circle 
that arise from the Euler-Maclaurin series. The reason that these correction terms 
to the Euler-Maclaurin summation formula are o(l) will arise momentarily: even the 
leading order term is only order-1, constant. 

Making the polar decomposition of and then rewriting r = 1 — so that 

dr = du (and reversing orientation of the integral), we have 


\z\^PO^i\z) dh- 


N 


= 2 


„Ari/2 


/ —OO 

2ui( 2„)J 


{p + l){p + 2) 
(1 - du + o(l) 


(4.47) 


= 2 






2u^{-2u) 


du -|- o(l). 
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In particular, this correction is independent of p, modulo vanishingly small remainder 
terms which are accumulated in the o(l). Rewriting u = x/2 and integrating by parts 
gives a constant which is equal to 3/2. 

We will not be able to make it to the order-1, constant terms in the iV —)■ cx) 

asymptotics series (in decreasing powers of N). The reason is that for we do 

not have sufficiently precise asymptotics to get to that level. Instead, what we will 

f2) 

do next is to consider what constraints the formula for the moments imposes on 0\f . 


( 2 ) 

4.4.3 Sub-Leading Divergences in the Term 

We have now accounted for all the non-vanishing contributions from the term. 
The leading-order divergence cancels with the leading-order divergence of the 
term. The sub-leading order part of the contribution to the moment is already 
order-1, constant, and it is independent of p. It equals 3/2. Note that the moment 
itself is also independent of p, it is 1 . 

( 2 ) 

Since we do not know the actual formula for , our plan for this section is 
to consider the proposed formula for in the bulk. That still leads to one other 
divergent contribution, diverging logarithmically in N. What this must mean is that 
in the formula for Z 2 ) for zi and Z 2 close, and both near the circle, there must 

be an edge correction, which leads to a counter-balancing divergence. This is what 
we explain in some more detail, now. This subsection is detailed and technical. 

We consider the proposed formula for that Chalker and Mehlig derived. This 
is the correct formula in the bulk, following the argument of their paper, although 
there is a lower-order correction near the circle. We will not include the correction 
on the circle. Instead our calculations will show constraints that must be satished for 
this correction formula. We use zi = z + and Z 2 = z — so that 


, ,9 ihn\uz 


4iV 


^12:2 = 


(4.48) 
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Therefore, using the bulk formula we would have 



Z 2 ) d^zi (fz2 


C JC 


TT^ 



C^C L 


, ,n 


iVV 2 


4iV 


1 - 


, ,n 


ca 


1 

+ 

cu 

|2)e-|-P 


\u 

4 


1 , |2 


iVV2 4iv ^ 

N-^d^u(fz, (4.49) 


where we use the approximation symbol ~ to remind ourselves that this is only one 
part of the eventual formula. Simplifying this, and writing = re*^ and u = pe**, we 
have 



z^zlO^izi, Z2) d^zi d?Z2 


C JC 


N 


TT 



C L 


2 irpsin(6* —f) p 

^ + ivV2 W 


2 IP 


1 - 


2 irpsin( 6 ' —t) 

^ ^ 




1 - (1 + p^)e 


- [0,4W] 


p ± ^rpdrdpdOdt. (4.50) 


Let us denote (p = t — 6. Integrating over the extra angular variable, simplifying 
the power of p in the second line, and simplifying the indicator in the second line, we 
obtain 



c JC 


z^zlO^^\zi, Z2) d^zi d^Z2 


2N 


TT 


re[0,l] p>0</)e[0,27r) 


irpsm{(f)) p^ 
AT /2 


l-{l + p^)e-P 
p3 


irpsin((;/)) p^ 
AT /2 


l[o,vi/ 2 i?p, 0 )] (p) rdr dp dp ), (4.51) 


for 


R{r,(j)) = 2 1 — sin^(0) — r| cos(0)|^ , 


(4.52) 
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arising from the condition \p ± 2iV^/^re*'^| < p < -R(r, 

Let us rewrite this once again, this time isolating different functional terms that 
we wish to consider in more detail: 



Z2) (fzi (fz2 


C Jc 


2N 




vr 


Fp{r^(t)^ p)W{p) dp\ rdrdcj), (4.53) 


re[0,l] 0e[O,27r) 


where 


Fp{r, 0, p) 


2 frpsin(0) 

P^' 

7i 

2 irpsin(0) 

P^' 

iVV2 

AN _ 

0- 

iVV2 

AN _ 


(4.54) 


and 

W{p) 

Now we note that we can expand 


1 - (1 + p^)e-F 


2p 

Fp{r,(l),p) = 

A:=0 


P 


J^k/2 ■ 


(4.55) 


(4.56) 


Odd powers of k have fp’^\r,4>) which is an odd function of sin(0). Since the rest 
of the integral will contribute even factors, this means all odd powers will integrate 
to zero, so we only keep track of even powers. We have already taken account of 
fp^\r, 0) which is just fp^\r) = r^^(l — r^). This was what gave us the leading order 
divergence we considered in a past subsection. 

Moreover, starting from the even power k = 4, we have 



re[0,l] ?ie[0,27r) 


J^^^Z,p)dp] rdrd4> 


0(1). (4.57) 


The reason is that iV-iV = N which is vanishing. This means that the lower 
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limit of integration is actually contributing a negligible correction, asymptotically for 
large N. Near the upper limit of integration, we may expand W{p) ~ p~^. Therefore 
we obtain near the upper limit, for /c = 4,6 ,..., 

Off ( \ 

- ^Nif^-2)/2 J J rdr# 

r-e[0,l] </)e[0,27r) ^ ^ 

= -; / j ff\r,4>) + = 0 ( 1 ). (4.58) 

re[0,l] 0e[O,27r) 


This only leaves the term with k = 2 which might diverge. Indeed, for this, we 
just have 


2 

TT 


r-e[0,l] </-e[0,27r) 


l‘N'-/^R{r,4,) 1 _ I'l 2'| -p2 

/ - dp\ rdrd<t). 

'o P 


The only divergent part of this arises near the upper limit for the p integral which 
gives ln(7V^/^i?(r, 0)) = | In(iV) + ln(i?(r, 0)), so the logarithmic divergence is 


2 

TT 


1 _ n -L 

I / -;- dp\ rdrdct) 


re[0,l] </)e[0,27r) 


\n{N) 


TT 


rflf^\r,4>) dr d4> + 0{1). (4.59) 


re[0,l] <j>&[0,2Tr) 


It is easy to see that 


0 ) = ^(1 — r^) — ^2 (x — r^) sin^(0) + pr^^sin^(0) . 

(4.60) 
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Therefore, we have 



27r Q - I - r^) - r^P-^(i - r^) + | 

[{p + l)‘^r‘^P - p‘^r‘^P~‘^] . 

(4.61) 


Therefore, the sub-leading order divergence is now 

1^ ZiZ^OPizi, Z 2 ) d^zi d^Z2 + (-p ^ ^ 2) ^ “I + <^(1)- (4-62) 

We may consider this particular form. It is independent of p. Near the circle, and for 
zi near Z 2 , the form of 2 : 1 ^ 2 ) leading order is just \z\‘^p which is just 1, because 2 : is 
near the circle. This is the same explanation for the reason that the order-1, constant 
term coming from term is independent of p. We also know that the moment 
must be independent of p. 

One could also try to calculate the order-1 contributions at this point, coming just 
from the bulk formula for 0\q [zi^z^]- One could then check whether these combine 
to a constant independent of p. That would be yet another strong check that Chalker 
and Mehlig’s formula for 0^^\zi,Z2) is true to very high accuracy in the bulk, and 
only needs an edge correction near the circle. 

It would be best to have a sufficiently explict formula for 0^^\zi, Z 2 ) to allow one 
to see the correction near the circle. Then we could have an answer to settle this. 
Next, we propose a method which we believe could potentially provide this. 
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4.5 Proposal to Rigorously Approach Chalker and 
Mehlig’s Result 

There are various ways to try to prove Chalker and Mehlig’s formula for the bulk 
behavior of 0\j . One way is to try to £11 in the details to make Chalker and Mehlig’s 
argument rigorous. Their idea is to express in terms of the expectation of a 
function of the eigenvalues, and then use the known eigenvalue marginal for the 
complex Ginibre ensemble. 

Here we want to propose a second method. The formula for O}^ {zi,Z 2 ) is the 
determinant of a 5-diagonal matrix. One may express such a determinant through a 
recursion relation, although the recursion relation is significantly more complicated 
than in the tridiagonal case. It is higher order, and it is a vector valued recursion 
relation for a vector with dimension greater than 1. We will not explicate this, here. 
It is well-known, it just follows from Cramer’s rule, and it is widely used in numerical 
codes. 

Instead, what we want to advocate here is solving recursion relations, at least 
asymptotically for large N, using adiabatic theory. We have not tried this yet for 
O}^ {zi, Z 2 ). There may be formidable difficulties which obstruct this approach, but 
let us demonstrate the idea for an easier problem: re-deriving the formula for 7l^^\z). 
This leads to an easier problem. The key trick for this particular problem is to realize 
that Tl^^\z), at least for the leading-order asymptotic formula, is constant in 2 : for 

kl < 1- 

4.5.1 The Recurrence Relation for Using Matrices 

We are treating the case of 77 (z) as a simpler toy model, in lieu of treating the real 

f2'l f2^ 

problem of interest which is O)^ {zi, Z 2 ). We hope to be able to handle O}^ {zi,Z 2 ) 
later, in another paper. 
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Recall from (4.20) that 'R^^\z) = vr ^[(iV — 1)!] ^ exp{—N\z\‘^)Disf_i{z), which 


means from Stirling’s formula that 




TT ^/2'^■N 




(4.63) 


Moreover, recall that there is a recursion relation in (4.22). Namely, dehning Di\i_i{a z) 
D]s[-i{a~^N~^^‘^z), it happens that 


Dn+i{a ,z) = (a ^\zY+ n + l)Dn{a , z) - a ^n\zYDn-i{a , z). 


Let us hx cr^ = iV ^ as Chalker and Mehlig do. Then 


D„+i(iV, z) = {N\z\‘^ + n + l)Dn{N, z) - Nn\z\^D^_,{N, z ). (4.64) 


Also, since the answer only depends on the magnitude of z, let us write r = \z\ so 


Dn+i{N,r) = {Nr'^ + n + l)Dn{N,r) — Nnr‘^Dn-i{N,r). (4.65) 

We want to calculate 77^^ (r) which is asymptotically given by 

77W(r) ~ i . (4.66) 

TT \/2ttN 

Now since we have a second-order recursion relation, let us dehne a two-dimensional 
vector Vn = [Dn-i{N,r), Dn{N,r)]* . (All our vectors and matrices will be real but 
we use the adjoint instead of the transpose because we want to keep the symbol T 
for other purposes.) Then the recursion relation says that 


^n+l -^n^n i 


0 1 
—Nnr'^ Nr‘^ + n -|- 1 


(4.67) 


and we want Di^_i{N,r) = e^UAr-i, where { 61 , 62 } is the standard basis for M^. In 
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order to have a simpler formula, we note that we can write Vi = ^ 062 , for Aq dehned 
as above. In seeking TZ^^\r), we really have 

~ - ■ -yL= ■ ■ ■ AiAoe2 . (4.68) 

TJ" V 27riV 

The idea is to try to express this using the spectral decomposition of the matrices 
An, where we use the fact that the matrices An are varying slowly in n, as much as 
possible. This is why we call this the adiabatic approach. 


4.5.2 Spectral Formulas and Summary of Main Contribution 

We may summarize the spectral information as 


= - I r" + 

^ 2 N 


± ■^''1 I / Tt 1 


± 


N 


7.2 j _|_ 47-2jy-i ^ 


, (W^TAn = XtiW^r , mrv: = 5 .,. , for a,r e {+1, -1}. 

(4.69) 

In particular, An = X^V^iW^)* + X~V~(W~)*. Therefore, we can rewrite the con¬ 
clusion of the recursion relation as 

TZ^^(r) ~ - ■ -^= ■ ■ ■ 7li7loe2 


Kf = 


1 

TT/± 1 1 

1 

-H e 
-< 

1 

_Xt_ 


1 


-AT 


TT ^y27lN 


t'N-2 


^-{N-1) ln{N)+N{l-r^) 


TT ^/27^N 


E ( n I (K'“’l‘e2)(e^U 


7(V-2)x 
' N-2 J 


(Te{+1,-1}^-1 \n=0 


(4.70) 
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Anticipating that the main contribution to this sum will be (t(0) = ■ ■ ■ = a{N — 1) = 
+1, we may rewrite this as 


n^^\r) ~ - ■ -jL= , (4.71) 

TT v27riV 

where M]v(r) is the “main term” 


^N-2 


^N-1 


MN{r) = ( {[Wo^Ye2){elV+_,) ■ [ 


(4.72) 


, n=0 


. n=0 


and V]sr{f') will be a series of perturbations 


Piv(r) 


<7e{+i 



At">\ 
AJ ) 


(Wl-e2)(e^V-+_2) (Id |»'i.l-r+ j ^ 

(4.73) 


We know that we are trying to hnd that the leading order behavior of 7l!'^{r) is 
as follows: it is constant, equal to for r < 1, and it is exponentially small for 
r > 1. We will not try to recover the boundary behavior near r = 1 in this note. (In 

f2'l 

fact, what we hope to be able to do in a later paper is to calculate OY {zi^Z 2 ) in a 
similar way, and especially to determine the edge behavior when zi and Z 2 are near 
the circle.) Let us quickly note how we may dispense with the r > 1 case so that we 
may focus on r < 1. 


The largest contribution to M]v(r) comes from the product of eigenvalues 


^N-2 

n A+ I = exp 


vn=0 


'N-2 


5^1n(A+) 


n=0 

N-2 


^ g(V-l)ln(iV) 


exp 


n=0 


.2 n + 1 

- H-h 

2 N 


n + 1 

~n' 


— r" 


2 + 4r2iV-i 


(4.74) 


Moreover, dehning tn+i = {n + l)/iV, the sum is {N — 1) times a Riemann sum 
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approximation so that: 




,-iN-l)ln{N) 


^N-2 

n^; 

V n=0 


In + t + ^/{t — r‘^y + 4:r‘^N~^'^ ^ dt + o(l) 

In Q {r-"^ + t+ \/{t- r2)2j^ dt + o(l) 


ln(max{r^, t}) dt + o(l), 


(4.75) 


where the remainder term o(l) is a quantity which converges to 0 as iV —)■ cx). Hence 
we may see, by integrating, that 


lim — In 

V—>oo N 


^-(W-l)ln(V) 


'N-2 

^n=0 


{ In(r^) if r > 1, 

— 1 if r G [0,1]. 


(4.76) 


This means that, incorporating the exponential 


part of the prefactor for 


[ 0 , 1 ], 

1 , 

(4.77) 

and it is easy to see that ln(a;) < a; — 1 for all a; G (0, cx)) by convexity of — ln(a;). For 
r > 1 this is exponentially small: to leading order claim that no 

other factor is exponentially large, so that we obtain 


lim — In 

iv—>oo TV 


<'N-2 


„-(Ar-l) \n{N)+N{l-r^) 




^n=0 


0 if r G 

In(r^) — 1 + if r > 


lim N 

N-^-OO 


-1 


ln(77ji^(r)) 


jo ifrG[0,l], 

I In(r^) — 1 + if r > 1. 


(4.78) 


Therefore, we will henceforth assume r < 1. 

When r < 1, we claim that we need to do a more careful analysis of the product. 
The time scale = n/N is too rough when tn is near r^. The purely discrete scale n 
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is too fine. Therefore, we use the intermediate time scale T„ = (t„ — instead. 

Then we may rewrite 


A+ = iVexp(^7+(T,^+l)), 

^+(T„+i) = In ^ \ iV-VyT,,V + 4r2^ (4.79) 


so that 



^Af-l 


iV^-'exp . 


^n=0 


(4.80) 


Then we use the Euler-Maclaurin summation formula to obtain all other terms in the 
asymptotic series which are signihcant, including some boundary terms that come 
with the Euler-Maclaurin formula. (To do an integral such that ^ dT, 

one may hnd it useful to dehne T = 2r sinh(a;) so that dT = 2r cosh(a;) dx and 
y/T^ 4r2 = 2rcosh(a;), as well. Doing all this leads to 



~ e • r(l 


^2yN\niN)-Nil-r^) 


It is also easy to use the dehnitions of and 14^+ to show that 


(4.81) 


[wyi’ej ~ lVV-2 and ejiy.j ~ N. (4.82) 


Using the Euler-Maclaurin summation formula, one may also prove that 

Af-l 

~ (4-83) 

n=0 

The details of the Euler-Maclaurin summation formula for this product as well as 
for the product of the eigenvalues are not trivial. (The product of the eigenvalues is 
harder than the product of the inner-products.) They may be done, in particular, by 
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using the intermediate time-scale parameter T„. Therefore, we obtain 

MN(r) = (4.84) 

Therefore, since 

7?.^^(r) ~ 7r“^(27riV)“^/^exp(—(iV — 1) In(iV) iV(l — r^))Mjv(r)'PN(r) 
we see that 

K'^’(r) ~ w-'^PMr). (4.85) 

Now we will argue that V^i^r) is actually independent of r, to leading order. 


4.6 Invariance of the Perturbation Series Vn{t) 


Let us write 

^iv(r) = ^ Pjv(c^;r), (4.86) 

ae{+i 

for 


iLr (W°'l‘e-2)(e;iyt"-=>) 

VN{(r-,r) = I ff 


A+ 

^n=0 ” 




^N-1 

n 

,n=0 


^y^^a^+l)^,ya{n)' 


(4.87) 


Let us think of a as a sequence of switches, from the -|- state to the — state, or 


vice-versa. 
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Using the notation = n/N and — r^), we may write 


[w^:+i]*(Kr-K+i) = 


r7V-i/2 


2^757+4^3 


27757 + 4^5 


1 + a 


1 + a 


Tn + ^n+l 


7T2 + 4r2 + 7r„7i + 4r7 

+ ^n +1 

v/7fT^+V^^+, + 4r2 


at,, 

(4.88) 


where we define AT, = T,+i — T, = This means that in a time AT, there 

is a factor proportional to AT, contributing to T’Ar((T;r), if we switch from a = + 
to r = — or from a = — to r = + because in these cases = 0. This is 

representative of a Poisson process of jumps. 

Moreover, if at a one jumps from + to — and at b one jumps back to +, then for 
all n G {a,..., 6 — 1} there is a contribution to T’Ar((j; r) equal to 




+ liV-iAT,+i - 

^2 + liV-i/2T,+i + iiV-V2^T^jj+4r2 


1 + 


Tn-\-Tn-\-l 




1 + 


2 ^n+T!^- 1-1 


2v/T^+i+4r2 ^T2+4r2+^T2_^^+4r2 


at. 


at. 


(4.89) 




AT, ), when 


and this quantity is asymptotic to exp - ^,^2 ^^ 4^2 

one takes TV —)■ cx) if one also takes a sequence of T,^ such that |T,^|/iV^A g. 
Moreover the product is decreasing very rapidly as |T,| gets large on an order-1 scale. 
Therefore, the correction to this asymptotic formula is neglible, for the purpose of 
calculating the leading order behavior of T’Ar(r). Therefore, defining V^'^{r) to be 
the sum of those TAr(a;r) with a starting at -|- at the left endpoint and returning 
to -|- at the right endpoint, with some number of intervals of — in between, we have 
the effect of switching from -|- to —, staying at — for an interval, and then switching 
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back. This gives 


lim V^'^ir) 


N^OO 


oo « 

_1 'J —C 


K 

X=l -00<Sl<-"<S2K«Xl 

rS2k 

exp 1 - / 


1 + 


*S'2fc-l 


‘^\l^Ik-I + \ \l^Ik-I + 

v/52 _|_ 4J..2 


if 

n 

A:=l 


A:=l '^'S'2fc-l 
1 


1 

7-2 + 4r2 

S2k 


ds 


1 - 


00 « 

i + E(-i)"/ 

x=i 


V^l. + 4r2 V + 4r2 

K 

n 


■ ■ ■ dS2n 


<a;i<---<X2if <00 y._^ \ [1 + exp(— 2 a; 2 fc-i)] [1 + exp( 2 a; 2 fc)] 

[ ^ fX2k \ 

exp j — / [4 cosh^(a;) — l]dx \ dxi ■ ■ ■ dx 2 K , 

V k=l •^X2k-l J 

(4.90) 


where we made the change of variables Sk = 2rsinh(a;fc), which is useful, as we have 
also mentioned before. Let us comment on where the r-dependence went. In fact the 
limits of integration for Si and S 2 K should be < Si and S 2 K < (1 — 

Since the exponentials are negative (and growing in magnitude), the integrand is 
converging rapidly. Therefore, we can replace the limits of integration, by allowing 
integrals over all space, with a correction due to the tails of the integrals which are 
exponentially small. Then the substitution we have made from Sk to Xk eliminates 
the r dependence, entirely. Finally, we mention that we can do the integral in the 
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exponential to simplify the formula, a bit; 


lim P^+(r) 

N^OO 


exp ^ In (l + e sinh( 2 a; 2 fc-i) - X 2 k-i\ 

-CX)<Xi<---<X2K<^ \k=l 


K 


i-KJ n 

i+E(-i)"/ 

K=1 

exp ^ [in (l + sinh( 2 a; 2 fc) - X 2 k] j dxi ■ ■ ■ dx 2 K 

OO 

= 1 + 5 : (-!)''■ 

K=l 

g-Z;f=i{ln[cosh(a;2fc_i)]+ln[cosh(a;2fc)]+sinh(2x2fe)-sinh(2a;2fc_i)) . . . dX2K 

(4.91) 


' —00<Xl<-"<X2K<'Xl 


Again, note that this is rapidly decreasing as xi —)■ — cx) or X 2 k — t oo. To get 
the analogous terms V^~{r), and V^~{r), we can just alter this formula 

essentially by taking xi —)■ —oo or X2k —t oo or both. (This is not entirely correct 
because we lose terms corresponding to the density for crossing, but morally it is still 
correct because the terms remaining are certainly going to 0.) Therefore 


lim VN{r) = lim V^'^{r). (4.92) 

N^OO N^OO 

Since we know that hm 7 v^.oo iiiust equal 7i~^ on the disk (for instance because 

the area of the disk is 1) this leaves the calculation to show that 


OO « 

i+E(-i)" / 

K=1 


g-Ilf=i(l°[cosh(a;2fc_i)]+ln[cosh(a;2fe)]+sinh(2x2fe)-sinh(2x2fe_i))^^^ . . . 


2K 


— 00<Xl<-"<X2K<^ 


? 


(4.93) 


At this time we cannot see a direct method to prove this, but we hope to explore it 
in a later paper. 
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4.7 Summary and Outlook 


We have considered the complex Ginibre ensemble. We consider the problem of 
calcnlating the mixed matrix moments to be a nice pedagogical problem. It may be 
nsed to illustrate the method of using concentration of measure to derive nonlinear 
recursion relations. This method is particularly important in spin glass theory, where 
it led to the Ghirlanda-Guerra identities, which are critical to those models. 


The most natural connection between spin glasses and random matrices are the 


spherical spin glasses of Kosterlitz et al, 1976 and Grisanti and Sommers, 1995 


This has been studied vigorously with very detailed results. See for example Auffinger 


et al, 2013 . The relation we have drawn between the overlaps in spin glasses and 


the moments in random matrix theory is mainly illustrative, to suggest the central 
role of concentration-of-measure (GOM). In addition to spin glass theory and random 
matrix theory, the idea of using GOM to derive low-dimensional nonlinear equations 
to replace linear equations in high dimensions is helpful in a variety of contexts 


Ghatterjee and Kirkpatrick, 2012 


The mixed matrix moments for the complex Ginibre ensemble are particularly nice 
moments to consider because the combinatorics are as simple as possible. (Indeed it 
is somewhat simpler than the usual Gatalan numbers that arise in the GUE/GOE 
moments or the bipartite Gatalan numbers that arise in the Marcenko-Pastur law.) 
Also, they are not as well-studied as the other moments for the classical Gaussian 
matrix ensembles, but they are still well-studied. However, an interesting facet which 
has not been exhaustively studied is their relation to the overlap functions dehned by 
Ghalker and Mehlig. 


Ghalker and Mehlig’s papers are extremely interesting and introduce what cer¬ 
tainly seems like a key object in random matrix theory that has not been taken up 
sufficiently yet by mathematicians. It is recognized as a key result by theoretical and 


mathematical physicists. See, for instance, the recent paper Burda et al., 2014 
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Chalker and Mehlig did not consider the application of calculating the mixed ma¬ 
trix moments from their overlap functions. Indeed, since the mixed matrix moments 
are already known, the reverse problem seems more reasonable, but it would probably 
be very difficult to calculate the overlap functions just from the mixed matrix mo¬ 
ments. However, what is true is that, if one takes Chalker and Mehlig’s formula for 
the bulk overlap functions, then the mixed matrix moments do place some constraints 
on the edge behavior, as we have shown. 

We have proposed a possible method for calculating O)^ {zi,Z 2 ), asymptotically, 
but we have not carried out this suggestion. We did illustrate it by re-deriving (z) 
by treating the second-order recursion formula as an adiabatic matrix evolution prob¬ 
lem. 


Now we would like to suggest another interesting direction for further study. Fy¬ 
odorov and Mehlig, and Fyodorov and Sommers, calculated two very interesting ex¬ 
amples of non-Hermitian random matrices for which they obtained exact expressions 


for the overlap functions Fyodorov and Mehlig, 2002 Fyodorov and Sommers, 2003 


They did not yet calculate the mixed matrix moments for these random variables. 
It would be an ideal problem to do so, and check the formulas linking the overlap 
functions and the mixed matrix moments. 


In a private communication with Shannon Starr, Fyodorov has explained that the 


eigenfunction non-orthogonality in the systems considered in Fyodorov and Mehlig, 


2002 Fyodorov and Sommers, 2003 has physical relevance. The overlap was shown 


by Fyodorov and Savin to give the resonance shift if one perturbs a scattering system 


Fyodorov and Savin, 2012 . This was even experimentally verified recently Gros 


et ai, 2014 


Finally, the hrst two overlap functions only help with calculating mixed matrix 
moments of the Ginibre ensemble of the form tr[HP(H*)P] for p = 1, 2,.... In order 
to calculate mixed matrix moments for more than two factors one needs higher order 
overlap functions. Given the difficulty to calculate the first two, this is a formidable 
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problem, but it might be a reasonable exact calculation for the matrix ensembles 
considered by Fyodorov and his collaborators. 
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5 Mallows Random Permutations 


5.1 A Q'-Stirling’s Formula 


Before we say anything abont a g-deformed Stirling’s formula, recall that Stirling’s 
formula says that 

_77/^ 

n\ ~ v^ 27 rn— 

gn 

This is an asymptotic formula. We use the ~ symbol to denote that 


A/27rn^ 

It is worthwhile to note that this formula can be proved using the Euler Maclaurin 
formula discussed in chapter 2. 

For hxed 0 < g < 1, we dehne 


m 


g 


l_gn 

1 - g 


We can then dehne a g-deformed factorial as 


N,! = N,ln-l],-..ll], = n\W 

k=l ^ ^ 



For notational convenience, we will denote [n]q by [n] and [n]q\ as [n]!, suppressing the 
dependence on q. In a work in progress with Shannon Starr, we require a Stirling type 
formula (or asymptotic formula) for [n]!. A similar formula was hrst proved by Moak 


Moak, 1984 . At the time this formula was proved, we were unaware of his work. 


As our methods and approximation are slightly different, we include our verison and 
proof of the g-Stirling formula here. 

Theorem 5.1.1. For /? G M, /ef g = exp(—/3/n). Let [n]! denote [n]q\ for this 
particular q. Then we have 

where Rn{(d) is a remainder term and Rn{(d) —)■ 0 as n ^ oo. 


Proof. First consider 


InlM! 


n\ 


As mentioned previously, we want q to be going to 1 as n —)■ oo, so we are looking at 
g = e“» for hxed jd. Notice that 


'"'3)=S"(fAi 


To approximate this sum, we use the Euler-MacLaurin approach and compare the 
sum to 


/ In 


dx 


For ease of notation, let 


(1 — q)x 


f{x) = In 
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In order to make this comparison, we will first compare ^f{k + 1) + hf{k) to 


ck+l 


f{x)dx 


and then sum over the fc’s. Using the fundamental theorem of calculus, we know that 
/(fc)+/(fc+l) jg 


d 

1 

Ik dx 

{x - k - -){f{x)) 


dx 


(5.1) 


Evaluating the derivative in the integrand of 5.1 gives 


-fc+i 


f{x) + (^x-k-^ f'{x) dx 


This can be broken up into two integrals. 


r-fc+l /-k + l / T \ 

f{x)dx + J lx — k—-]f'{x)dx 


( 6 . 2 ) 


where the first integral is exactly what we wanted to compare to and the second 
integral is an error term. Consider now only this error term 




(^i - f'{x) dx 


(5.3) 


Notice that 


^fc+i / 

I ix — k — - ]dx = 0 


so we can add or subtract any constant from f'{x) without changing the value of the 


integral. Using this fact, 5.3 can be written as 




x-k-^^ [f{x) - /'(O)] dx 


Since this equation is true for any l<A;<n — 1, we can simplify this and let /c = 0 
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and fc + 1 = 1, which gives 

This substitution will not cause any problems, because we can always replace f{x) 
by f{x + k) later. Since f'{x) — /'(O) can be rewritten as 


fiy) dy 


we can write 5.4 as the double integral 


1 nx 



0 ^0 


- 2 ) / iy)dydx 


Switching the order of integration gives 


1 ^1 



0 Jy 


^ - 2 I ^ (2/) d|/ 


After integrating with respect to x we are left with 


y{^-y)f"{y) dy 


Using this combined with 5.2, we have shown that 


^/(O) + ^/(l) = ^ f{x)dx + j^ x{l-x)f\x)dx 


At this point we can return our attention to the original problem, in which we need 
to sum up all of these integrals. 


n n—1 -| 


(5.5) 
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From the previous calculation, 


n ^ n 

y'7;(/(^) + /(^ + l)) = / f{x)dx + '^ x{l-x)f"{k + x)dx 
k=i ^ Jo 


Since we can move the summation inside the integral, we now turn our attention to 
f"{x) to see if this sum will converge. Calculating f”{x) gives 

-(ln(g))V ^ 

(1 — x‘^ 

Substituting e“" for q gives 

n^l-e-^Y ^ 

jSx 

Multiplying top and bottom of the first fraction by 


-(3^ 


o/ /?£ 1^^ 

n^(^e‘2n — e 2n 


2 


+ 


1 


X 


2 


Noticing that the bottom of the hrst fraction is equal to (2?7,sinh(|^))^ leaves 




1 

H—^ 


dn^sinh^d^) x 


This term will converge pointwise to 0 by the dominated convergence theorem as 


- —)■ 0. Going back to (5.5) gives 


In ( ^ 1 ~ ]-f{n) + f f{x)dx + ^ ^ / x{l - x)f"{k + x)d: 


n\ 


.X 


'1 


k=l 


'0 


where this last term will converge in the limit. 
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If we take q = ex'p{—l3/n) for some /3 G M, then we obtain 

1 - e-l^ 


In 


n\ 


q=e-PI'^ 


- In 1 , a, 

2 \{l-e-^/^]n 


1 

2 


rn ( \ — 


(1 — 


dx 


„1 n -1 

/ 1 _ ^ '' 

1 

/32 1 

/ / . 

k=l 

{k + x^ 

4n2 smlA{P[k + x]/n)_ 


dx. 


We can rewrite this as 


In 


n\ 


q=e-P/'^ 


— nA{j3) + B{j3) + Rn{ld ), 


where A(/3) and -B(/9) do not depend on n and Rn{/3) is a “small” remainder term, 
which vanishes for [3 hxed in the limit n —)■ cx). More precisely, 

m = - / In —— dx, 

n Jo V Px/n J 


which can be seen to be independent of n, by making a change of variables, x = ny 
so that dx = ndy. 

I'" 

We can write 


B{/3) 


^ + - In 

2 2 



We throw all of the error terms we accumulated into the last term. It is convenient 
to break it into three parts: 


KW) = RU'W) + rSHP) + RS'W) , 
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where 


1 ^1 

= 5 / 1(1- 

^ k=l 


2 Jq + 4n'^ sinh.‘^{/3[k + x]/n) 

/ 1 — \ R 

««’(« = I ( (l-e-^/-)J * - - 2 • 

= i liif , ) - hn f . 


At this point, the proof of our theorem is complete, provided that we prove the 
following lemma. □ 


Lemma 5.1.1. For /3 G M fixed, we have 


Ri»(fl). RS'W) ^ 0 , 


as n ^ oo. 


Proof. We immediately know that Rn\/3) converges to 0, as n —)■ oo, since 


1 pI n-i - 

RSF) = 2 I ^(1-^)E[^ 


+ xfi In^ sinh^(/d[A;-I-xj/n) 


and we may use the dominated convergence theorem. 
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For the next term, we notice 


(/5) = In ^ j dx - nA{^) - - 

'l_e-/ 3 /«; 2 io V Mn J 

= —nln —- — — — / In —-—^- dx 


= —nln 


13/n 

1 - e"^/” 
(d/n 


. p/3/2n 


(dx/n 

“1 /I — g-4a:/n' 

In I —— - 1 dx 

px/n 


= ' j (I)) -i 


We know that the integrand converges to ln(l) = 0 pointwise, so that the integral 
converges to 0 by DCT. For the other term, we know that 


2n / /3 

— smh — —^ 1, 

fd \2n' 


as n —)■ cx). Moreover, we have the Taylor expansion 


Ou'^ CC'~' 


sinh(x) = a; + — + — + ■ 
3! 5! 


X 


2n+l 


(2n + 1)! 


+ ... , 


which means that 


This gives 


sinh(x) 

X 


1 + 



\ R"^ 

1=1 + + ... = 1 + 0(n“^), as n -)■ CX), 

where 0(n“^) means that there is a function (which depends on (d as well as n) which 
may be bounded by a hnite constant C (which is a function C(/d) depending on (d) 


2^ ■ x,( 

— smh — 
(d \2n 
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times n ^ for sufficiently large values of n. This means 


In ( s.nh (T 

V/3 \2n 


= 0{n ), as n —)■ cx), 


since ln(l + x) = x + O(x^), as x 0. Then 


-nlnf^smhfT 

V/3 V2n 


= 0{n ), as n —)■ CX), 


which means in particular that it converges to 0 as n —)■ oo, (since 1/n does). We 

(O') 

have seen that {/3) does indeed converges to 0 as n —)■ cx). 

Finally, we have 




- In 


- >■> f ^ 

2 V /3 


2 


= -- In 
2 


(1 - e-f^/^)n 

1 - 


(3/n 


We know that 


1 - 

(3/n 


1) 


as n —)■ cx. Therefore, the logarithm converges to 0. 


□ 


5.2 The Mallows Measure 

Given a permutation of n numbers tt G Sn, we dehne the inversion number Inv(7r) to 
be 


Inv(7r) = #{(i, j) ■■ i< i and Ti{i) > n{j)} 
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For each q G (0,1), the Mallows measure Mallows, 1957 is dehned by 




^Inv(7r) 


‘^n,q 


where „ is a normalization constant given by 


ttGSu k=l 


= \ny 


where [n\q\ is as stated in the previous chapter. The measure is related to the Iwahori- 
Hecke algebra as shown by Diaconis and Ram |Diaconis et al, 2000 . Note that 
for q = 1, the Mallows measure is just the uniform measure on Sn, with all n\ 
permutations equally likely. 


5.3 Fisher-Yates Algorithm 


The Fisher-Yates algorithm is a method of obtaining a uniform random permutation 


from a hnite set. The algorithm was hrst introduced by Fisher and Yates in Fisher 


et al, 1949 . Their original introduction of the algorithm was as a ’’paper and pencil” 


type algorithm for generating a random permutation of n numbers by hand. The 
algorithm was hrst presented as a computer algorithm by Durstenfeld jPurstenfeld, 


1964 and became more widely known in a work by Knuth Knuth, 2014 


The algorithm consists of the following four steps: 


Fisher Yates Algorithm. 

1) Set a counting variable j to be equal to 1. Let n denote the length of the desired 
sequence. We will let L be a sequence which holds our permutation. We will begin by 
letting L = (1). 

2) Let m = i + 1. Pick an integer uniformly at random between 1 and m. Call this 
integer k. 
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3) If k = m, then append k to the end of the list L. Otherwise, insert m into L at 
position k. 

4) Inerease i by 1. If i < n, then return to step 2. Otherwise, the algorithm is 
complete. 

To see this algorithm in action, we will do an example for n = 4. To begin with, 
i = 1 and L = (1). To generate our random integers for this example, we used the 
Python generator random.randint(). 

Iteration 1 

1) i = 1 and L = 1. 

2) Since m = i + 1, m = 2. Generating a random integer between 1 and 2, we get 
k = 2. 

3) Since k = m, we append k to the end of L, giving L = (1, 2). 

4) We increase i to 2, and since i < n, we go back to step 2 for another iteration. 

Iteration 2 

2) i = 2, so m = 3. Generating a random number between 1 and 3, we get k = 2. 

3) Since k < m, we insert m into position k in the list. This gives L = (1,3, 2). 

4) Increasing i by 1, we have i = 3, which is still less than n, so we go on for another 
iteration. 

Iteration 3 

2) i = 3, m = 4. 

3) We generate a random number between 1 and 4 and get k = 3. Since k < m, we 
insert 4 into position 3, which gives L = (1,3,4,2). 

4) Once we increase i by 1, we notice that i = 4, and so the algorithm terminates. 


We end up with L = (1,3,4, 2) as our random permutation. A Python code for 
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performing this algorithm on a computer is given in the appendix. 

As mentioned, this algorithm shuffles the numbers (1,... ,n) uniformly, so that each 
permutation is equally likely. Since we are trying to simulate a Mallows random 
permutation, we have adapted this algorithm to return a permutation distributed 
according to the Mallows measure. 

Fisher Yates Algorithm for a Mallows Permutation. 

For a permutation of length n, with Mallows parameter q, we have the following al¬ 
gorithm to generate a Mallows distributed random permutation. 

1) Begin with i = 1 and L = (1). 

2) Let m = i 1 and let k be a random integer distributed aeeording to a geometrie 
distribution with probability p = 1 — q. 

3) Let i = 1 + {{k — l)%m), where by %m, we mean modulo m. 

4) If j = 1; append m to the end of the list L. Otherwise, insert m into L at position 
m-\-1 — j. 

5) Increment i byl. If i < n, go back to step 2. Otherwise, the algorithm terminates. 

We will not go through an example here, as this algorithm runs very similarly to 
the uniform Fisher Yates algorithm. Since it may not be obvious, we will prove why 
this modihed algorithm generates a random permutation distributed according to the 
Mallows measure. 

Theorem 5.3.1. The modified Fisher Yates algorithm stated above does give a per¬ 
mutation distributed according to the Mallows measure. 

Proof. Recall that the Mallows measure is given by 

glnv(7r) 

7 


^n,q{7l) 
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where 


^n,q — 

k=l 


I — 
l-q 


We will prove the theorem by induction. We will start with the case n = 2. In 
this case, the only possible permutations are (1,2) and (2,1). Based on the Mallows 
measure. 


P{(1,2)} 


1 - g 
1 — 


and 


P{(2,1)} 


9(1 -<]) 

1 — 


Consider our algorithm. We always start with (1). In this case, we will either be 
adding 2 at the end of the permutation, or we will be inserting 2 into slot 1, giving 
us (2,1). Given the algorithm above, if j = 1, then we will get (1, 2) and if j = 2 we 
have (2,1). j = 1 only if /c = 1, 3, 5,.... Using the fact that /c is a geometric random 
variable, we have 

P {(1, 2)} = P{j = 1} = F{k is odd} 

OO 

= 5^(1 - 9)9“ 

1=0 

= ^ ~ ^ 

1 — g^ 

as desired. On the other hand 


P{(2,1)} = P{j = 2} = Pj/c is even} 


= 5^(1 - 

i=0 

^ g(l - g) 
1 — g2 

This completes the proof of the base case. 


For the inductive step, suppose that for a permutation of length n, the algorithm 
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does in fact give a permutation distributed according to the Mallows measure. In 
other words, letting tt be a permutation of length n, we know that 

glnv(7r) 

n 1 —g*= 
k=l l—q 

Suppose now that tt' is the same permutation as tt, except with the element n + 1 
added in via the algorithm given. We need to prove that 



IPn+l(7r) 


^Inv(7r)^Inv„+i(7r') 


n n+1 1 —gfc 
k=l l—q 


where InVjj_|_i(7r') denotes the number of inversions caused by the element n + 1. We 
can assume that we have run our algorithm successfully up until n and just need 
to perform the last step of the algorithm to add in n + 1. In this case, i = n and 
m = n +1. If j = 1, we know that adding in n + 1 will cause no additional inversions, 
so = 1. j = 1 only if /c — 1 is a multiple of n + 1. Using the fact that k is a 

geometric variable, we have 


P{j = 1} = P{(A; - l)%(n + 1) = 0} 


i=0 


1 - g 
1 - 


This implies that 


Pn+l(7r') 


glnv(7r) 

n n+1 1—7* 
k=l l—q 


as desired (since adding in the last point did not cause any additional inversions). 


Now suppose that j = 2. This implies that {k — l)%(n + 1) = 1. This will occur 
only if k = i{n + 1) + 1 for some integer i. If j = 2, then Inv„+i(7r') = 1, since n + 1 
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will only cause an inversion with an element in the nth position. We have 


P{j = 2} = 

i=0 


g(l - g) 
1 - 


and in this case, we have 


as desired. 


Pn+l(7r) 


glnv(7r)^ 

n n+l 1—q^ 
k=l l—q 


This pattern will continue in general. Suppose that adding in n + 1 causes / 
inversions. Then, we know that it must have been added in at position n + 1 —/. From 
the algorithm, this means that j = I + 1. This will occur only if [k — l)%(n + 1) = /. 
In this case 


P{j = J + l} = 5^g*("+')+^(l-g) 

i=0 


9^(1 - q) 
1 — 


From this, we have 


which completes the proof. 


Pn+l(7r') 


qinv{-K)qi 

n n+1 1 — q^ 
k=l l — q 


□ 


5.4 Length of the Longest Increasing Subsequence 


Consider a permutation tt G Sn- An increasing subsequence A ,* 2 , ■ ■ ■ Afc of a permuta¬ 
tion i I—>• 7i{i) is a subsequence such that A < • ■ ■ < 4 and vr(zi) < 7i{i2) < ■ ■ ■ < 7r(4)- 
We will be concerned with determining the length of the longest increasing subse¬ 
quence in a given permutation. Denote the length of the longest increasing subse¬ 
quence of 71 by f'(7r). 


The following example is due to Aldous and Diaconis, 1995 . Consider the per¬ 


mutation given by 


728134 10 695 
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where 7r(l) = 4, 7r(2) = 2, 7r(3) = 5, etc. Then a longest increasing snbsequence is 
given by 

1 3 4 6 9 


In this case i{7r) = 5. 

Notice that the longest increasing snbseqnence is not necessarily nniqne. 

2 3 4 6 9 


is also an increasing subsequence of length 5. The longest increasing subsequence 
problem goes back to Ulam Ulam, 1961| . Ulam asked what is the distribution of the 
length of the longest monotone (increasing or decreasing) subsequence of a uniform 
random permutation. While we will not go into the history here, a detailed account 
of Ulam’s problem and Monte Carlo methods can be found in [Hammersley, 1972 


Quite a bit of progress has been made concerning the distribution of the length 
of the longest increasing subsequence, provided that the permutation is uniformly 


distributed. Hammersley Hammersley, 1972 showed that E£(7r) ~ c-Qn, where n is 


the length of the permutation and c is a constant. Vershik and Kerov Vershik and 


Kerov, 1977 and Logan and Shepp [Logan and Shepp, 1977 proved that the constant 
c is equal to 2. Their methods of proof relied on hard analysis of the asymptotics of 


Young tableau. Aldous and Diaconis Aldous and Diaconis, 1999 give an interacting 


particle process argument for c = 2. In addition, Baik, Deift, and Johansson Baik 


et al., 1999 showed that the fluctuations of the length of the longest increasing 


subsequence for a uniform permutation are Tracy-Widom, on the order of More 
specihcally, they show that 

2y^ d 

X 


n 


1/6 


where x is a random variable with Tracy Widom distribution. The distribution 
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function for the Tracy Widom distribution is 


F(t)=exp(-^ 


(x — t)v?{x)dx 


where u{x) is the solution of the Painleve equation 


Uxx = + XU 


See Baik et ai, 1999 for more background on the Tracy Widom distribution. 


Much less is known about the distribution of the length of the longest increasing 
subsequence of a random permutation distributed according to the Mallows measure. 
In [Mueller and Starr, 2013 , Mueller and Starr proved a weak law of large numbers 
result analogous to the Vershik-Kerov and Logan-Shepp results for the uniform case. 
To continue this work, we would like to bound the fluctuations of the length of the 
longest increasing subsequence of a Mallows permutation. As a first step in this 
direction, we use the modified Fisher-Yates algorithm to generate a random Mallows 
permutation, then use an algorithm called patience sorting to compute the length of 
the longest increasing subsequence of the generated permutation. 


5.5 Patience Sorting 


The presentation of patience sorting that we describe here follows the algorithm as 
given by Aldous and Diaconis in [Aldous and Diaconis, 19^ . Patience sorting is a 
type of one person card game. Imagine that we have a deck of cards with the numbers 
1,... ,n on them. We shuffle the deck thoroughly, and put the cards in a pile face 
down. We turn the cards face up one at a time and put them into a pile according 
to the following rule: 

A low card may be placed on top of a higher card (i.e. a 2 on top of a 7), but a higher 
card must be placed into a new pile to the right of the current piles. 
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The object of the ’’game” is to hnish with the fewest piles. 

As a short example, let suppose that we have a pile of cards labeled 1,..., 6. 
Let us shuffle them (uniformly at random) and suppose that we end up with the 
permutation 

4 1 3 2 6 5 

with the 4 on the top of the deck, and the 5 on the bottom. To begin the patience 
sorting algorithm, we start with the card 4, which will be the beginning of our first 
pile. The next card that we draw is a 1. Since this is less than 4, it can go on top of 
the four in the first pile, so that our piles look like 

1 

4 

Next we draw a 3. Since this is larger than 1, it cannot go to the top of the pile, it 
must start a new pile. Now we have 


1 3 
4 

We next add the 2 to the top of the second pile, since 2 > 3. 

1 2 
4 3 

Adding in 6 requires us to make a new pile 

1 2 6 
4 3 
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We can then place the last card, 5, on top of the third pile giving us 

1 2 5 
4 3 6 

Notice that we end up with 3 piles. Notice also that the length of the longest increasing 
subsequence of the permutation is 3. Once such subsequence is 

1 3 6 


and there are several more, but none of length more than 3. It turns out that this is 


not a coincidence. The following theorem is due to Aldous and Diaconis Aldous and 


Diaconis, 1999 


Theorem 5.5.1. With a given deck it, patience sorting played with the greedy strategy 
ends with exactly i{n) piles. In addition, the game played with any legal strategy ends 
with at least i^n) piles. 


Proof. Suppose that we have cards Oi < 02 < ■ • • < oa, an increasing subsequence 
in our pile. Then under any legal strategy, each a* must be placed in a stack to the 
right of Oj-i, since any card placed on top of Oj-i must be less than the value on Oj-i. 
This implies that the final number of piles must be at least k, and since this is the 
length of an arbitrary increasing subsequence, the number of piles must be at least 
i{n). Furthermore, suppose we choose the greedy strategy, where we only start a new 
pile if we are forced to. Suppose each time we put a card a into any pile other than 
the hrst pile, we place a pointer from that card to the card on the top of the pile 
immediately to the left. Notice that this card will always be less than our current 
card. At the end of the game, if we follow the pointers backward from the top card 
on the last pile, we will have an increasing subsequence whose length is the number 
of piles. □ 
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Length of the Longest Increasing Subsequence (n=10,000) 


Figure 5.1: Length of the longest increasing subsequence of a uniform permutation of length 10,000 


Using this theorem and the patience sorting algorithm, it is possible to have a 
computer compute the longest increasing subsequence of a Mallows permutation. 
The Python code for such a program is included in the appendix. The following 
figures show a histogram for the length of the longest increasing subsequence of a 
permutation of length n = 10, 000, run 200 times under the uniform distrubution and 
the Mallows distribution for varying values of q. 

Unfortunately, we were unable to get too much useful information out of our 
simulations due to the fact that we did not have enough computing power to run 
the algorithm for big enough permutations. We computed statistics (mean, variance, 
skewness, kurtosis) for our permutations in hopes of matching the experimental data 
to what was expected for a type of Tracy Widom distribution, but the results were 
inconclusive. 


5.6 Four Square Problem 


We would now like to give an idea as to how the g-Stirling’s formula arises in the 
problem of bounding the fluctuations of the length of the longest increasing subse- 
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Length of the Longest Increasing Subsequence (n=10,000) 


Figure 5.2: Length of the longest increasing subsequence of a Mallows permutation of length 10,000 
with q — 0.99 



Length of the Longest Increasing Subsequence (n=10,000) 


Figure 5.3: Length of the longest increasing subsequence of a Mallows permutation of length 10,000 
with q = 0.88 
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R21 

R22 

Rii 

Ri 2 


Figure 5.4: An example of a decomposition of [0,1]^ into four rectangles i?ii, i?i 2 , R 2 i,R 22 - 


quence in a Mallows random permutation. This work is ongoing, but an important 
step is discussed in this section. A random permutation (Mallows or otherwise), can 
be viewed as a set of points in a rectangle, in the following way: 

Consider n points {xi,yi) in the rectangle [0,1] x [0,1] in with all coordinates 
distinct. The set of points specihes a permutation tt G 5^ by the rule: ’’The point with 
the ith smallest y coordinate has the 7r(i)th smallest x-coordinate”. Hence, given a 
set of points in a box, we can obtain a permutation from these points. Depending on 
how the points are distributed in the box, we can obtain permutations with different 
distributions. As an example, if the points are uniformly distributed in the box, we 
obtain a uniform random permutation. 


To begin to bound the fluctuation of a Mallows random permutation, we assume 
that we have n points in the unit square distributed so that they give a Mallows ran¬ 
dom permutation. We then divide the square into a large number of small subsquares. 
If the size of each subsquare is small enough, the points in the subsquare will be ap¬ 
proximately uniformly distributed. We then hope to couple our model to a model of 
Deuschel and Zeitouni [Deuschel and Zeitouni, 19^ to bound the fluctuations. 


This argument will not be presented here. For now, we will simply look at the di¬ 
vision of the unit square into four subrectangles to illustrate the use of the g-Stirling’s 
formula. Suppose that we divide the unit square into four rectanges, which we refer 
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to as i?ii, i?i 2 , -R 215 -^ 22 - See figure 5.4, Assuming that we have n total points in the 


square, let nn denote the number of points in i?ii, ni 2 denote the number of points in 
/?i 2 , and so on. We then have un + ni 2 + U 21 + 77.22 = n. Denote the area of rectange 
Rij as Pij. Consider the distribution of the points { xi , 7/1), (0:2,1/2), • • •, 7/„) in the 

square. If the points are distributed uniformly (i.e. if g = 1), then the probability of 
the event 


is given by the usual multinomial formula 


n\ 


77ii!?7i2!?72i!7722! 


- „»^22 
iFll P 12 P 2 I P 22 


(5.6) 


For 0 < g < 1 and g 7 ^ 1 (in other words, a Mallows random permutation), the correct 
probability is obtained by multiplying the above expression by the factor 

ni 2 n 2 i [^11 + ^I 2 ]![y^ii + y^ 2 i]![^i 2 + ^^ 22]![^21 + ^ 22 ]! _ !(^ 12 )!(^ 2 i)!(^ 22 )! _ n[_ 

(7711 + 7712)!(7711 + 7r2l) 1(77,12 + ’^22)!(’^21 + ^^22)! 1 [^^12] 1 Kl] 1 [^^22] 1 N! ’ 

(5.7) 
and using 

the notation {77}! := [77,]!/77,!, we have 


where [a]! denotes the g factorial as defined earlier. Combining |5.6| and 5.7 


p : (X,, n) e = 77,,}) = 

^1 f TT nij\ {nil + ^12}!{^11 + ^2l}!{^12 + ^22}!{^21 + ^ 22 }! namia 

nl-^iK)! yi/"’ J {-n} !{ni2}!{n2i}!{n22}!{nn +ni2 + ^21 + ri22}! ^ 

( 6 . 8 ) 

This formula is somewhat involved, but it is explicit. By applying the g-Stirling’s 
formula, we can obtain the exact asymptotics for the probability distribution in the 
limit n —)■ oo, with g = 
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Recall that Stirling’s formula says that 

n\ ~ 


and the ^-Stirling’s formula states that for q = e 


ln( M ) =nA{f3) + B{/3) + R^{f3) 


where 


A{^) = ^ > dx 


BW = f + iln 


fix 

1 - 

f^x 


and Rn{P) is a remainder term which goes to zero as n — )■ cxd. Before we can apply 


these asymptotics to equation 5.8, we need a preliminary lemma 


Lemma 5.6.1. For q = e 


ln({n,,}!) = n,jA fi) + B /i) + R^^^ /3 


n 


n 


n 


This lemma is easily proved using the g-Stirling’s formula and rewriting q as 
q = where fi' = 


To break down the asymptotics of 5.8 a bit, let 


^ ^ {nil + ni2}!{nii + n2i}\{ni2 + n22}!{n2i + naa}! 

{nii}!{ni2}!{n2i}!{n22}!{nii ni2 ^21 n22}! ^ 


In addition, let z/jj = For this analysis, we will assume that all Vij are order 1, 
so that we are not letting any of the squares be too small. If this is the case, we can 
make the approximation 

Injn^}! nuijA{Pvij) 
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Using this assumption, we have the following lemma 
Lemma 5.6.2. For z/n, z/ 12 , 1 ^ 21 , ^22 > 0, we have 

lim - \n{Wq) = -/3ui2i^2i + { 1^11 + i^i2)A{/3[uii + z/ 12 ]) + (^^ii + i^2i)A{/3[uu + 1 ^ 21 ]) 

n^oo Ti 

+ (^12 + ^22) ^{I 3 [i ^12 + ^22]) + (^21 + ^22) ^{l 3 [l^ 2 l + ^22]) 

- UiiAdSuii) - Z/i2v4(/3z/i2) - Z/2iA(/3z/2i) - Z/ 22 A(/3z/22) “ A{l3) (5.9) 

Proof. By dehnition, 

lim — ln(lU„) 

n^oo 71 

- lim - In r •'^ 2 n,y 2 in {^n^ + ^i 2 n}\{i 2 nn + z/ 2 in}!{z/i 2 n + z/ 22 n}!{z/ 2 in + U 22 n}\ \ 

n^oon \ {z/iin}!{z/i2n}!{z/2in}!{z/22}!{z/iin + Z/I2n + z/2in + z/22n}! / 

Using the g-Stirling formula and the previous lemma, we get 

= lim —(In (Q'’^^^”^^^"') + (z/iin+z/i2'n)yl(/3(z/ii+z/i2))+- ■ ■+(z/2i?r+z/22'n)/l(/3(z/2i+z/22)) 

n^oo 71 

- uiinA{Pi'n) - i^i2nA{l3i/i2) - i'2inA{(3nu2i) - i'22nA{(5i'22) 

- + z/i2n + z/2in + i'22n)A{(5{i'ii + z /12 + ^21 + ^^ 22 ))) (5.10) 

Distributing the n and taking the limit immediately gives us what we need except for 
the hrst and last terms. Consider just 

lim ln(g^i2^'^2i") 

n^oo 

Since q = e~^^'^, this is 

= lim 

n^oo n 

= -(-/3z/i2nz/2l) = -/3 z/i2Z/21 

n 
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This gives us the hrst term in our lemma. The last term is equal to 

lim —{yiin + z/ 12 'U + + ^12 + ^21 + ^ 22 )) 

n—>-00 Tl 

= lim (yii + 1^12 + V 21 + ^ 22 )-^{,l^iyii + ^12 + ^21 + ^ 22 )) 

71^00 

= AW) 

since Yl‘ij=i ^ij — 1- Putting all terms together proves the lemma. □ 

It is worth noting that the asymptotics for 

-In IT, 
n 


given by this lemma give ns an equation analogous (and very similar) to equation (6) 
in 


Starr and Walters, 2015 


Combining these asymptotics with the asymptotics for 


n\ 




gives 


where 


P (nT=i{#{fc : (X,, n) e = n,,}) 


72 


Tin 




Tin. 


V 


(5.11) 


n 

A = ?T,(ln(n) — Vij In(njj) 

— / 3 z/i 2 l ^21 + (^11 + l'l2)A{li[vii + 1^12]) + {i'll + V 2 l)A[l 3 [vii + Z/21]) 

+ (i^l2 + l^22)A{li[vi2 + ^^ 22 ]) + (^21 + ^22)A{li[v2l + ^ 22 ]) 

- - Z/i2v4(/?Z/i2) - iy2lA{(3u2l) - Z/22-4(/?I/22) (5.12) 
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5.7 Conclusion and Outlook 


As previously mentioned, the results in this section are preliminary steps toward 
bounding the fluctuations of the length of the longest increasing subsequence of a 
Mallows permutation. The next step is to use the approach of the four square problem 
to solve a nine square problem. Once the asymptotics are computed for that problem, 
we can generalize to a large number of small squares and obtain a local central limit 
theorem for the counts on small subsquares. After that, we hope to couple our model 
to the model of Deuschel and Zeitouni [Deuschel and Zeitouni, 1999 and then use 
Talagrand’s isoperimetric inequality to bound the fluctuations. These results will 
appear in a future work. 
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A Python code: Simulating a 

Mallows Random Permutation 


; ? ? ? 

permutation . py 

©author: Meg Walters 
? ; ? 

import numpy as np 
import random 
import math 
import bisect 

import matplotlib . pyplot at pit 
def patience_sort ( 1 is t ): 

#This funetion ereates a multidimensional array 

^containing all of the stacks 

#of the patience sorting algorithm 

#Input: 

# list: list of numbers to sort 
#Output : 

# len(stacks): returns the numbers of stacks 
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# 

stacks = [] 

len.stacks = [] ^variable to keep traek of number of staeks 
for X in list: ^iterate through list of numbers 

temp_stack = [x] #put number in a temporary staek 
i = b isect . b i s e c t _1 e ft ( stacks , temp.stack) 

^determines where number should be inserted if 
# it was to be inserted in order 
if i != len(stacks): 

#if number is not larger then all numbers on top 
#of stacks 

stacks [ i ]. insert (0 , x) #put number on appropriate staek 
len_st acks . append (len ( stacks )) #update length variable 

else ; 

stacks . append (tenip_stack ) #create new staek 

len _st acks . append (len ( st acks )) #update length variable 

def fisher _yates (length ) ; 

#uses fisher gates algorithm to create random permutation 
#Input: 

# length: desired length of permutation 
#Output : 

# L: random permutation 

L=[l] #begin with only 1 in the list 
for i in xrange (length—1): 

^iterate to create a list of length ’length ’ 
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ir^i+2 ^initialize/update m 
k=random . randint (1 ,m) 

^generate a random integer between 1 and m. 
i f k=^ai: 

L.append(m) #append m to the end of the list 
else ; 

L. insert (k — l,m) #insert m in k—1 plaee in list 
return L #return random permutation 
def mallows (length , q) : 

#uses the mallows measure to create a permutatioin 
#Input: 

# length: desired length of permutation 

# g : 1— pr ob ability 
#Output: 

# L: permutation 

L=[l] #begin with only 1 in the list 
for i in xrange (length—1): 

^iterate to create list of length ’length ’ 
n^i+2 ^initialize/update m 
x=np . random . geometric (p=l—q , size =1) 

^generate a geometric random integer, probability p 
y=l+((x—l)%m) #find y based on mallows 
if y==l: 

L.append(m) #append m to end of the list 
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else : 

L. insert (mH-l—y,m) 

#insert m at the m+l—y position in the list 

return L 

lengt h .list =10000 #change this variable to change n 
length_data=200 #change this value to change 

#number of times program should 
#run to collect data 

data = [] ^initialize array to hold data 

q = .8 #change this value to change the Mallows g 

#create data 

for i in xrange (length.data ): 

data . append (patience.sort ( mallows (length.list ,q))) 

#create histogram for given data 
fig=plt . figure () 
ax=fig . add.subplot (111) 

n, bins, patches = ax.hist(data,30, normed=False , . . . 

. . . facecolor=’green ’ , histtype=’bar ’ , align=’mid ’) 
ax. grid (True) 

pit . title ( ’ LLIS ^ of ^a^ Mallows ^Permutation ’) 
plt.xlabel(’ Length ^ of ^ the ^Longest ^Increasing ^Subsequence’) 
pit . y lab el ( ’Number^ of ^occurences ’) 
pit . show 0 
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